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Abstract 

We consider a model of a population of fixed size N undergoing selection. Each individual 
acquires beneficial mutations at rate /./.v, and each beneficial mutation increases the individ¬ 
ual’s fitness by sn- Each individual dies at rate one, and when a death occurs, an individual 
is chosen with probability proportional to the individual’s fitness to give birth. Under certain 
conditions on the parameters hn and sjv, we show that the genealogy of the population can be 
described by the Bolthausen-Sznitman coalescent. This result confirms predictions of Desai, 
Walczak, and Fisher (2013), and Neher and Hallatschek (2013). 


1 Introduction 

In population genetics, one is often interested in understanding the genealogical structure of 
a population. That is, we take a sample of individuals from a population at some time and 
trace their ancestral lines backwards in time. As we trace the ancestral lines backwards in time, 
the lineages will merge until eventually all sampled individuals are traced back to one common 
ancestor. For many standard population models, including the classical Moran model m, the 
genealogy of the population is best described by a process known as Kingman’s coalescent, which 
was introduced in m- Kingman’s coalescent is the coalescent process in which only two lineages 
ever merge at one time and each pair of lineages merges at rate one. 

For populations undergoing selection, Kingman’s coalescent does not always provide an ad¬ 
equate description of the genealogy of the population. If one individual acquires a beneficial 
mutation which then spreads rapidly to a large fraction of the population, many ancestral lines 
could merge nearly at once because they all get traced back to the individual that acquired the 
beneficial mutation. As a result, the genealogy of the population is best described by a coalescent 
process that permits more than two lineages to merge at one time. Such processes, known as coa- 
lescents with multiple mergers or A-coalescents, were introduced by Pitman [22] and Sagitov [26] 
and have been studied extensively in the probability literature in recent years. For previous work 
in which coalescents with multiple mergers were used to describe the genealogy of populations 
undergoing selection, see @1 El El HU HSUS]. 

In this paper, we will consider the following population model. The population has fixed size 
N. Each individual independently acquires mutations at times of a Poisson process with rate 
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Hn- All mutations are assumed to be beneficial, and the fitness of each individual depends on 
how many mutations the individual has acquired, relative to the mean of the population. More 
precisely, let Xj(t ) be the number of individuals with j mutations at time t, which we call type 
j individuals, and let 

1 OO 

"{*) = 

3=0 

be the average number of mutations carried by the individuals in the population at time t. Then 
the fitness of an individual with j mutations at time t is defined to be 

max {0,1 + Sjv(j — M (t))}. 

Note that the parameter sjsr measures the selective advantage that an individual gets from each 
mutation. As in the Moran model, each individual independently lives for an exponentially dis¬ 
tributed time with mean one. When an individual dies, it gets replaced by a new individual whose 
parent is chosen at random from the population. The probability that a particular individual is 
chosen as the parent is proportional to that individual’s fitness, and the new individual inherits 
all of its parent’s mutations. 

This model was studied in great detail using nonrigorous methods by Desai and Fisher m, 
who obtained results concerning the rate of adaptation, meaning the rate at which the mean 
fitness M(t) grows as a function of time, as well as the distribution of the fitnesses of individuals 
in the population at a given time. See also mmm for related results, and see [28] for a 
good summary of the literature on this model and closely related models. The genealogy of the 
population in this model has been studied only within the past few years. Desai, Walczak, and 
Fisher HD argued that the genealogy of the population can be described by a process called 
the Bolthausen-Sznitman coalescent, which we will define precisely in section [2j Neher and 
Hallatschek [2Tj arrived at the same conclusion for a slightly different model. 

This model was also studied in detail in the paper EZ1, which contains rigorous proofs of 
the results of Desai and Fisher [lOj concerning the rate of adaptation and the distribution of 
fitnesses of individuals in the population. In the present paper, which is a sequel to m , we 
build on the techniques developed in m to provide a mathematically rigorous description of 
the genealogy of the population. We confirm nonrigorous predictions presented in mm and 
show that the genealogy of the population is given by the Bolthausen-Sznitman coalescent, under 
suitable conditions on the parameters sn and /r^r. 

The rest of this paper is organized as follows. In section [2[ we state precisely our assumptions 
and the main result of the paper, which is Theorem 12.11 below. In section (3[ we give a heuristic 
argument that explains the ideas behind why Theorem 12.II is true, and we make some connections 
with other results in the literature. In section [H we summarize the results from m that will be 
needed in the present paper. The remaining sections are devoted to proving Theorem 12.11 


2 Assumptions and Main Result 


We first define the following two quantities, which were also used in m and which are important 
for scaling the process correctly: 


kN 


log N 

log(sjv/AtJv)’ 


log {s n /hn) 

CLN = - 

SN 


( 2 . 1 ) 
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As we will see below, fcjy is the natural scale for the number of mutations because the difference 
in the number of mutations carried by the fittest individual in the population and an individual 
of average fitness is typically within a constant multiple of fcjv- Also, we will see that a at is the 
natural time scale on which to study the process. 

We will need the following assumptions on the parameters sn and /in, which are identical to 
the three assumptions that appeared in m : 


Al: We have 


kn 


lim 

iV-K>o log(l/ Sjv) 


oo. 


A2: We have 


lim k N log fcjy 
Af->oo log(sjv/ IJ’N) 


= 0. 


A3: We have lim sjsrkiy = 0. 

N—too 


Dividing A3 by Al, we get 


lim sn = 0. 
N—too 


Also, as noted in m, these assumptions imply that for all a > 0, we have 

1 


lim = lim 


N —»oo S 1 ^ JV—>■ oo HnN c 


= 0 , 


( 2 . 2 ) 


(2.3) 


which means the mutation rate /ijy tends to zero faster than any power of sjy but more slowly 
than any power of 1/N. 

In view of (12.21) . assumption Al implies that lirrijv-»oc kjy = oo. This means that the difference 
between the number of mutations carried by the fittest individual and the number carried by an 
individual of average fitness tends to infinity as N —>• oo. Because each additional mutation 
adds sn to the fitness of an individual, assumption A3 implies that the difference in fitness 
between these two individuals tends to zero as N —>• oo. As discussed in E3, assumption A2 
ensures that mutations do not happen too fast for the analysis in this paper and [2?j to be valid. 
Understanding how the population evolves under faster mutation rates is an important question 
for future work. 

Although the parameters hn and sn depend on A, we will drop the subscripts and write // 
and s throughout the rest of the paper to lighten notation. 

Before stating the main result, we need to define the Bolthausen-Sznitman coalescent, which 
was introduced in [5|. The Bolthausen-Sznitman coalescent is a continuous-time Markov chain 
(n (t),t > 0) taking its values in the set of partitions of {1,..., n}. It is defined by the property 
that 11(0) = {{1},.. . , {n}} is the partition of 1 ,..., n into singletons, and then whenever the 
partition has b blocks, each possible transition that involves merging k of the blocks into one, 
where 2 < k < b, happens at rate 


\k= I'y k -\l-y) b - k dy, (2.4) 

Jo 

and these are the only possible transitions. A more detailed construction of the Bolthausen- 
Sznitman coalescent will be given shortly in section 13.11 
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Theorem 2.1. Assume A1-A3 hold. Fix positive real numbers t and T such that t > 0 and 
T > t + 2. Fix a positive integer n, and sample n individuals at random from the population at 
time ajvT. For 0 < u < t + 1, let I1 n(u) be the partition of {1,... ,n} such that i and j are in 
the same block of the partition if and only if the ith and jth sampled individuals have the same 
ancestor in the population at time ajv(T — u). Then 

lim P(U N (l) = {{l},...,{n}}) = l. (2.5) 

iV—>oo 

Also, the finite-dimensional distributions of (IIjv(1 + u),0 < u < t) converge as N —>• oo to the 
finite-dimensional distributions of the Bolthausen-Sznitman coalescent. 

Note that Theorem 12. 1 1 stipulates that with probability tending to one as N —>• oo, the sampled 
individuals at time a^T will all be descended from different ancestors at time o,n(T—1). However, 
as the ancestral lines are traced back further, the merging of these ancestral lines obeys the law 
of the Bolthausen-Sznitman coalescent. This result also appears in mi, where it was obtained 
by nonrigorous methods. 


3 Heuristics and Background 

3.1 The Bolthausen-Sznitman coalescent 

Recall that the Bolthausen-Sznitman coalescent is the coalescent process whose transition rates 
are given by (12.411 . Pitman [22] showed how to construct the Bolthausen-Sznitman coalescent 
from a Poisson process. We give a variation of this construction here. Consider a Poisson process 
on [0, oo) x (0,1] x [0, l] n with intensity 

dt x y~ 2 dy x dz\ x ■ • • x dz n . 

Let n(0) = {{1},... , {n}} be the partition of 1,..., n into singletons. If (t, y, z \,..., z n ) is a 
point of the Poisson process, and if the blocks of the partition n(f—), ranked in order by their 
smallest elements, are B\,..., Bf,, then n(f) is the partition obtained from n(f—) by merging 
together all of the blocks Bi for which Zi < y. 

Informally, this means that if (t, y) are the first two coordinates of a point of the Poisson pro¬ 
cess, then at time t we have a so-called y-merger, in which each block independently participates 
in the merger with probability y. If n(f—) has b blocks, then for 2 < k < b, the probability that a 
particular set of k blocks merges into one is y k ( 1 — y) b ~ k , which allows us to recover the formula 
(12.41) for the transition rates. 

To see that the construction above is well-defined, note that a point {t,y,z\,... ,z n ) of the 
Poisson process can only produce a merger at time t if at least two of Z\ ,..., z n are less than or 
equal to y. The rate at which such points appear is bounded above by 


y 


-2 


n 


y 2 dy < oo. 


Therefore, only finitely many such points will appear in any bounded time interval, and the con¬ 
struction above can be carried out by considering these points in order by their time coordinate. 

We now give a heuristic argument to explain when the Bolthausen-Sznitman coalescent should 
be expected to describe the genealogy of a population. Note that if a population has size S and 
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then a new large family of size Sx suddenly appears, then the fraction of the population belonging 
to the large family will be x/{l + x). Consequently, if we are tracing ancestral lines backwards 
in time, approximately a fraction x/{l + x) of the lineages will coalesce around the time that 
this family appears. That is, we will have a y-merger with y = x/{ 1 + x). For the Bolthausen- 
Sznitman coalescent, we can see from the Poisson process construction above that y -mergers with 
y > x/(l + x) occur at rate 

[ V~ 2 dy = x~ 1 . (3.1) 

Jx/(l+x ) 

Therefore, the Bolthausen-Sznitman coalescent will describe the genealogy of a population when 
families of size Sx or larger appear at a rate proportional to x~ x . 


3.2 A heuristic argument for Theorem 12.11 

In this subsection, we give a short approximate calculation to suggest why Theorem 12.11 should 
be true. For j € N, let 

Tj = inf | (3.2) 

be the first time that there are at least s/y individuals in the population with j — 1 mutations. 
It was shown in m that typically no individual acquires a jth mutation until after time Tj. 
We write for now qj = j — which is the difference between j and the mean number of 

mutations carried by the individuals in the population at time Tj. As argued in dam], shortly 
after time Tj, the number of type j — 1 individuals in the population is growing approximately 
exponentially at the rate s(qj — 1), which means that when t is slightly larger than Tj, we have 

Aj-_i(i) » (3.3) 

n 

Because each type j — 1 individual independently acquires mutations at rate y, at time u we have 
type j individuals appearing due to a mutation at rate yXj_\ (it). If such a mutation happens 
at time u, then because type j individuals have a selective advantage of approximately sq : over 
the rest of the population, the expected number of descendants of this mutation alive at time t 
is approximately e sq id - u ). Therefore, using ra, 


Xj(t) 


y • —e s ( 9j — 1 )( u_Tj ) • e sq ^- u) du = se^ t ~ T ^ I e -»(«-' r i) du « e a ®( t-T A (3.4) 


J Tj 


Usually, the type j individuals will belong to many small families. That is, many type j — 1 
individuals will acquire mutations, each of which will become the ancestor of only a small fraction 
of the type j population. In that case, the approximation in (j3.4j> will be valid. However, 
occasionally there can be an unusually early mutation, when a type j — 1 individual acquires 
a jth mutation much sooner than expected. When this occurs, the descendants of the new 
type j individual can eventually constitute a significant fraction of the type j individuals in the 
population. These unusually large families can lead to multiple mergers of ancestral lines, as 
many lineages get traced back to the individual that got the early mutation. 

To estimate the probability that this happens, we approximate qj — 1 by qj in (13.31) to see that 
at time u, mutations from type j — 1 to type j are occurring at rate approximately se sqj ^ u ~ Tj \ 
If such a mutation does occur, then the number of descendants of this mutation behaves like a 
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supercritical branching process with deaths at rate 1 and births at rate 1 + sqj. Such a branching 
process survives with probability approximately sqj and, conditional on survival, the size of the 
population after it has evolved for time t — u is approximately 


— e sqj{t-u) 
sqj 


where W has an exponential distribution with mean one. In particular, a successful mutation 
that occurs at time 


U = Tj - 1-log 

s qj 



+ v 


has approximately 


W e - sq i v e sq j( t - T i) 


descendants in the population at time t. Write S = e sq ^ t ~ Tj \ which from (13.41) is approximately 
the number of type j individuals at time t that do not come from unusually early mutations. By 
integrating over the possible times when the mutation could occur, we see that the probability 
that there will be a mutation that is the ancestor of at least Sx type j individuals at time t is 
approximately 



Se s< ?j [log(l/s<?j )/s<?j +-^1 . 


sqj ■ P(We~ sq i v 


> x ) dv 



dv 


1 

q jX ' 


(3.5) 


Note that the factor of x^ 1 on the right-hand side of (13.51) matches the right-hand side of (13.11) . 

Consider now what happens when we sample n individuals from the population at time ajyT 
and trace their ancestral lines backwards in time. As noted in one type will dominate the 
population at a typical time, so with high probability, the sampled individuals will all have the 
same type, which we will call type £. With high probability, the sampled individuals will be 
descended from distinct type £ ancestors at time T£ + i. Because we will see that the time between 
when type £ individuals originate and when they become the dominant type in the population is 
approximately a at, this means the ancestral lines will most likely not merge when they are traced 
back from time ajy to time otv(T — 1), which leads to the result (12.51) . 

As we trace the lineages further back, with high probability they get traced back to type £—1 
ancestors at time , then to type £ — 2 ancestors at time r^_i, and so on. At each stage of this 
process, there is a small probability that a group of ancestral lines will merge together because 
they get traced back to an individual that acquired an unusually early mutation. Because of 
the agreement between (|3.1I) and (|3.5I) . these mergers follow the same dynamics, in the limit as 
N — >• oo, as the Bolthausen-Sznitman coalescent. 

The explanation given here for the appearance of the Bolthausen-Sznitman coalescent is 
very similar to that given by Desai, Walczak, and Fisher mi and by Neher and Hallatschek 
[21], though these authors did not work directly from the Poisson process construction of the 
Bolthausen-Sznitman coalescent. 


3.3 Comparison with branching Brownian motion 

Theorem 12.11 resembles the main result of [J] , in which the authors confirmed nonrigorous predic¬ 
tions of Brunet, Derrida, Mueller, and Munier m ei and showed that the Bolthausen-Sznitman 
coalescent describes the genealogy in a different population model involving selection. In [4J, 
the population was modeled by branching Brownian motion with absorption, in which initially 
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there is some configuration of particles in (0, oo), each particle independently moves according 
to Brownian motion with drift — z/jv, each particle divides into two at rate one, and particles are 
killed upon reaching the origin. In this model, the particles represent individuals in a population, 
the position of a particle corresponds to the fitness of the individual, branching events represent 
births, and killing at the origin models the deaths of individuals whose fitness is too low. It was 
shown in [3] that if the initial configuration of particles and the drift parameter vn are chosen 
so that the number of particles in the system stays comparable to N, then the genealogy of this 
population is given by the Bolthausen-Sznitman coalescent. 

One difference between the model in {4]J and the model studied in this paper is that for 
branching Brownian motion with absorption, all individuals have the same birth rate, while 
individuals with low fitness are killed. In the model considered here, all individuals have the 
same death rate, while individuals with higher fitness are more likely to give birth. In part 
because of this difference, the two population models behave quite differently in many respects. 
For example, in the branching Brownian motion model, the speed of evolution is measured by 
the drift vn required to maintain a stable population size, which is 


In ~ 

VN Y (log N + 3 log log IV) 2 


That is, as IV —>• oo, the speed of evolution tends to the limiting value y/2 at the rate of (log N)~ 2 . 
This kind of behavior was first observed by Brunet and Derrida [6] and was verified rigorously 
for other probabilistic models in [51 fl8l 20]. However, as shown in [101 27] , the population model 
studied in the present paper does not show this behavior. Also, for branching Brownian motion 
with absorption, once the particles reach a sort of equilibrium, the density of particles near y is 
roughly proportional to 


o~VNV 


sm 


Try 

Ln 


where Ln = (log IV + 3 log log N)/\/2. This is again quite different from the results for the 
model studied in this paper, where the distribution of fitnesses has a Gaussian-like shape; see, 
for example, [a nni eh E7i eh]. Finally, for branching Brownian motion with absorption, if two 
particles are sampled at some time, then the time that one has to go back a common ancestor 
of these two particles is comparable to (log IV) 3 , as compared with the time scaling by a at in 
Theorem [2711 Yet, in spite of these differences, we find that the Bolthausen-Sznitman coalescent 
describes the genealogy in both models. 


3.4 Connection with multitype branching processes 

We mention here how the appearance of the Bolthausen-Sznitman coalescent in this model could 
have been predicted from known results about multitype branching processes. Consider a two- 
type Yule process in which type 1 individuals give birth to type 1 individuals at rate A and to 
type 2 individuals at rate fi, and type 2 individuals give birth to type 2 individuals at rate A+s. If 
we say that type 2 individuals belong to the same family when they are descended from the same 
mutation, then the sizes of type 2 families at some large time t can be approximated by the points 
of a Poisson process on (0, oo) with intensity Cx~ l ~ a , where C is a constant and a = A/(A + s); 
see Theorem 3 of m and the following corollary. This implies that the total number of type 2 
individuals has approximately a stable law of index a and that the distribution of the family sizes, 
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normalized to sum to one, is the Poisson-Dirichlet distribution with parameters (ct, 0), which was 
introduced in [23] . 

if (n(t),t > 0) is the Bolthausen-Sznitman coalescent, then the distribution of the block sizes 
of 11(f), normalized to sum to one, converges as n —>• oo to the Poisson-Dirichlet distribution 
with parameters (e -t ,0), as shown in [22], Thus, the appearance of stable laws in the work 
Durrett and Moseley m and Durrett, Foo, Leder, Mayberry, and Michor [12], who studied a 
multitype branching process model for tumor progression, and the appearance of the Poisson- 
Dirichlet distribution in the work of Leviyang BE, who studied the coalescence of HIV lineages 
in a similar model, strongly suggest that the Bolthausen-Sznitman coalescent should describe the 
genealogy in similar models when the selective advantage s is tending to zero. This conjecture 
is confirmed by Theorem 12.11 above. Indeed, the work [L2L MlUZj, which appeared before the 
work of Desai, Walczak, and Fisher CD and Neher and Hallatschek [21] . served as the original 
motivation for the present paper. 


4 Review of results from 1271 


The population model considered in this paper was also studied extensively in [27j, and in the 
present paper, we will make heavy use of some of the results and techniques developed in [27 ]. 
In this section, we will state the results from m that we will need. 


4.1 Evolution of type j individuals 

We first present some results summarizing how the type j individuals evolve. Let e > 0, 5 > 0, 
and T > 1. Recall the definition of A;at from (12.11) . and let 


k* = max 


| j G N : j < k N + 


2 k N log fcjy | 
log (s/n) ) 


Note from assumption A2 that (2 A;jv log fcjy)/log(s///) —» 0 as N —>• oo. As discussed in [27], for 
j < k*, individuals of type j appear in the population very quickly. To understand the evolution 
of the type j individuals for j > k* + 1 , define 


b = log 


24000 T\ 

5 2 e )' 


(4.1) 


Also, define tj as in (13.21) . and then set 


* _ ( j — k n if am — 2a n/ k^ < Tj < + 2a^ /k^ 

^ \ j — M(jj) otherwise 

and 

q,j = max{l, q*}. (4.2) 

Next, let 

y r j + y log (A) + A}, (4.3) 

as in m- Every type j individual at time t has an ancestor that acquired a jth mutation before 
time t. If this jth mutation occurred at or before time we call the individual an early type j 
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individual. When an individual gets its jth mutation, we call this a type j mutation, and we call 
such a mutation an early type j mutation if it occurs at or before time Let Xj \(t) denote the 
number of early type j individuals at time t, and let Xj^(t) denote the number of other type j 
individuals at time t, which means 


x j (t) = x j} 1 (t) + x j , 2 (t). 


For t > 0, let 


Gj(t) = s(j - M(t )) - n, 


which represents the growth rate of the type j individuals in the population at time t. For 
j > k* + 1, let 

7 j = Tj + a N (4.4) 


and 


* i 

T .; = T-j + 


a-N 

ATk N ' 


Proposition 14.II collects several results related to how the type j individuals evolve. The first 
four parts of the proposition are identical to Proposition 3.3 of EH, except for the last statement 
of part 1, which comes instead from Lemma 8.18 of m • The first two parts of the proposition 
describe how the type j individuals emerge before time Tj+\. Part 3 describes the evolution 
of the type j individuals after time Tj + \ but before the type j individuals start to get close to 
extinction. Part 4 bounds the extinction time for the type j individuals, as well as the size of 
the type j population as it nears extinction. Part 5 of the proposition, which is Remark 6.9 in 
[27] ; demonstrates that nearly all individuals in the population have type j between times jj and 
7 j + i. Finally, part 6, which is a combination of parts 1 and 3 of Proposition 3.6 in [27], bounds 
the difference between Tj and Tj + \. 


Proposition 4.1. There exist positive constants C\ and C 2 , depending on 5, e, and T, such that 
if N is sufficiently large, then the following statements all hold with probability at least 1 — e: 


1. For all j > k* + 1 and all t € [r*, Tj + \\ n [0, a^T], we have 


Xj : i(t) < C\ exp 


Gj(y) dv 


(4.5) 


Also, Xj y \{t) < s/2/jl for all t < t* A a^T, and no early type j individual acquires a type 
j +1 mutation until after time t ]+ \ Fa^rT. Moreover, no individual that gets a jth mutation 
at or before time Tj has a descendant alive in the population at time r*. 

2. For all j > k* + 1 and all t € [r*, Tj + \] n [0, a^T], we have 


(1 — 4h) exp ^ j Gj(v ) dv'j < Xj^it) < (1 + 45) exp ^ J Gj(v ) dv^j . (4.6) 


Moreover, the upper bound holds for all t € [£j, T i+\\ H [0,a^T], 
3. For all j > k* + 1 and all t E [r^+i, 7j+A'] FI [0, a^T], we have 
(1 - 5)s 




■ exp 


f Gj(v ) dv\ < Xj(t) < -———— exp ( [ Gj(v) dv 

7+i / /' \Jr j+1 


(4.7) 
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4- Let K = |_fc/v/4j. For all j > k* + 1, we liare 

Xj(t) < — exp f f 

l 1 \ JTj + i 

/or all t € ajvT 1 ]. Also, /or all j > k* + 1 such that 7 j+n 7 fc Ar ] < aA^T, we have 

Xj(t ) = 0 /or all t > 7i+fi7M • 

5. For all j > k* + 1, we liare 



1 OO 

^ ^ -XTi(0 < C^-^+i-*) 
i=j+l 



/or all t G [(4/s) log &at, 7j+i] H [0, oatT] and 


i J - 1 


/or all t € [7j,7i+x] fl [0,ajvT]. 


6. TTe have T k * + \ < 2ajv/fcjv • Also, /or all j > k* + 1 smc/i that either Tj + 2aj\r/k]y < a^T or 
Tj+i < oatT, we have 

a N , 2 aAT /, n \ 

<Tj + i-Tj<——. (4.9) 


More precisely, 


and 


3k]\r 

Tj+i/att 


L 


k N 

1 + 25 


/ 


«(*) dt < 

Tj/a, N k N 


Tj+ i/a N 1 — 25 

{q(t) + t mhlk * +i/aN)} ) dt > 

Tj / a N 


k N 


where q is the function defined later in |/773p. 
Remark 4.2. Let 

J = 3kj\rT + k* + 1. 
As noted in Remark 3.7 of EH, when (14.911 holds, we have 


(4.10) 


Tj > Tj - T k * +1 > —— (J - (fc* + 1)) A a N T = a N T, 

SKn 

and furthermore when the statement of part 1 of Proposition 14. II also holds, no individual of type 
J + 1 or higher can appear until after time a a tT. 


The next proposition contains some bounds related to the quantities Gj(t) and qj that are 
important for the analysis that follows. The first three parts of the proposition come from Lemma 
8.8 of [271J . The fourth part is part of Lemma 6.1 of [273, an( 4 the fifth comes from Lemmas 8.25 
and 8.26 in [27]. 
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Proposition 4.3. There is a positive constant C 3 , depending on e, 5, and T, such that if N is 
sufficiently large, then the following statements all hold for all j such that k* + 1 < j < J with 
probability at least 1 — e: 

1. If Tj > ajy + 2aw/kisr and t € [rj,Tj+ 1 A ayT], then s(qj — C 3 ) < Gj(t) < s(qj + C 3 ). 

2. If t € [rj,Tj + i A ajyT], then (1 — 25)sk]\r < Gj(t ) < Gj(t) + /a < (e + 26)skw. 

3. If Tj < ajyT, then (1 — 2 5)k]\r < qj < (e + 25)kiy- 

4■ If Tj + i < ai\rT, then exp ( f ^ +1 Gj(y) dv) < 2s//a. 

5. If j > k* + 1 + K, then 

Gj{v)dv j e - sk N( u -r j+ 1)/5 if u E [rj+i^j-x] n [0, onT] 

- I {s/p)~ kN/ul ifu G [yj-K,Tj+K\ n [0,ajvT]. 

Let A be the event that the six statements in Proposition 14.11 and the five statements of 
Proposition 14.31 all hold. Note that the event A depends £. 6, T, and N. Then Propositions 14.ll 
and 14.31 imply that 

P(A) > 1 — 2 e (4.11) 

if N is sufficiently large. We now define a random time which we interpret as being the 
first time that one of the statements of Proposition 14.11 or Proposition 14.31 fails to hold. Write 
X(t) = (Xo(t), Xi(t),... ), and let (. lFt,t > 0) denote the natural filtration of the population 
process (X(t),t > 0). Then define 


c = inf {t : P(A\J r t) = 0 }. 

Since Propositions 14.11 and 14.31 only describe the behavior of the process up to time ayT, the 
event A is equivalent to the event {C > which in turn is equivalent to the event {£ = 00}. 

Note that the definition given here for f is not quite the same as the definition in [27] because in 
m some additional properties were listed that are not relevant for the present work, and some 
of the properties listed above were derived from others. Nevertheless, the idea is the same in 
both papers. Namely, if t < £, then all of the properties specified in Propositions 14.11 and 14.31 
hold through time t. 


4.2 Selective advantage of the fittest individuals 

The result below, which is Theorem 1.1 of mi gives an asymptotic result for the difference in 
fitness between the fittest individual in the population and an individual of average fitness. 

Proposition 4.4. For t > 0, let 


Q(t) = max{) : Xj (t) > 0} — M(t). 

Assume A1-A3 hold. There is a unique bounded function q : [ 0 , 00) —>• [ 0 ,oo) such that 

_ / e* ifO < t < 1 

Q \ ft-iQ( u ) du 


(4.12) 


(4.13) 
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If S is a compact subset of (0,1) U (l,oo) ; then 


sup 

tes 


Q{a N t) 


kN 


-?(*) 


0 , 


where —>• p denotes convergence in probability as N —>• oo. 


(4.14) 


The next proposition collects some properties of the function q. All of these results are part 
of Lemma 7.2 of m except for (I4.16|> . which follows from (14.1511 and the definition of q. 

Proposition 4.5. The function q defined in is continuous on [0,1) U (l,oo), and 


lim q(t) = 2. 


Also, 

and if t < u with 1 ^ (t, u], then 


1 < q(t) < e 


for all t > 0 


\q{u)-q{t)\ < e(u — t). 


(4.15) 

(4.16) 


4.3 A useful martingale 


Here we review the construction of a martingale that was central to the analysis in m and will 
be important again in the present paper. As in E3, let Fj (t) be the fitness of a type j individual 
at time t, which is max{0,1 + s(j — M(t))}, divided by the sum of the fitnesses of all individuals 
in the population at time t, which is N if every individual’s fitness is strictly positive. Remark 14. 2 1 
and assumption A3 imply that if N is sufficiently large, then every individual’s fitness is strictly 
positive at time t for all t < (, in which case 


m 


1 + s{j - M(t)) 
N 


(4.17) 


To define birth and death rates, we follow closely the discussion in [27] and observe that there 
are three ways that the number of type j individuals could change at time t: 


1. Each type j — 1 individual acquires a j th mutation at rate p. Therefore, at time t, the rate 
at which a type j individual appears due to a mutation is ), where we adopt the 

convention that A_i (t) = 0 for all t > 0 so that our formulas are valid when j = 0. 


2. The number of type j individuals could increase by one at time t due to a birth. This 
happens if one of the N — Xj(t —) other individuals dies at time t, which happens at rate 
N — Xj (t —) because each individual dies at rate one, and if the new individual born has 
type j, which happens with probability Xj(t—)Fj(t—). Therefore, we define the birth rate 

B j (t) = (N-X j (t))F j (t). (4.18) 


3. The number of type j individuals could decrease at time t due to a mutation or death. 
The rate at which one of the type j individuals becomes type j + 1 due to a mutation 
is /iXj(t-). Death events that reduce the number of type j individuals happen at rate 
Xj(t—)( 1 — Xj(t—)Fj(t—)) because there are Xj(t—) type j individuals each dying at rate 
one, and when a death occurs, the probability that the new individual born does not have 
type j is 1 — Xj(t—)Fj(t—). Therefore, we define the death rate 

Dj(t) = 1 + n — Xj(t)Fj(t). (4-19) 
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For alii > 0 and j € Z + , let 


G*(t) = B j (t)-D j (t). 


One can easily check that whenever (|4.17|) holds, we have G*(t) = Gj(t). Also, as shown in 
section 5.2 of E3, whenever (14.171) holds and j < J, we can see, using assumption A3, that for 
sufficiently large N, 


Bj(t) + Dj(t) 


(N-2X J (t))(l + s(j-M(t))) 
N 


+ 1 + p. < “2 + sJ + p <3. 


(4.20) 


The result below is Proposition 4.1 of m- The martingale defined in this proposition is 
similar to the one obtained in section 4 of [Mi- 

Proposition 4.6. For all t > 0 and j € Z + , let 

Zj (t) = e~ So G *i M dv Xj ( t ) - [ nXj- 1 («)e“ So G *j («) dv du - Xj (0). (4.21) 

Jo 


Then ( Zj(t),t > 0) is a mean zero martingale with 


Var (Zj(t)) = E 



e -2/„"q(«) + Bj(u)Xj{u) + Dj[u)Xj{u)) du 


We will sometimes need to apply the result of Proposition 14.61 to only a subset of the type j 
individuals in the population. If k and 7 are stopping times with respect to (E t ,t > 0) such that 
0 < k < 7 , then for t > 0 and j € Z + , let Xj G {t) be the number of type j individuals in the 
population at time t that are descended from individuals that acquired a jth mutation during the 
time interval (k, 7 ]. Let B^ G (t) and D^ G (t) denote the expressions on the the right-hand sides 
of (14.18[) and (14.191) with X^' 1 (t) in place of Xj(t). The result below is Corollary 4.4 of [27] . 

Corollary 4.7. Let k and 7 be stopping times with n < 7 . For t > k, let 


z r 


(t)= 


-f* 

- P J K. 


G* (v) dv 


I 

J K 


Xj’ 1 ^) - j p,Xj_ i(u)e S. 


G*(y)dv 


du. 


Then (Zj G (k + 1), t >0) is a mean zero martingale and 


Var (Z *’ 7 


= E 


(K + t)\E K ) 

K-\-t 


L 


+ B^(u)X^(u) + D« G (u)X*’T 


u)) du 


T 

J K. 


Furthermore, if t is a stopping time with k < t, then + 1) A t), t > 0) is a mean zero 

martingale, and Var(Z ^’ 7 ((k + t) A t)\F k ) is obtained by replacing k + t with (n + t) A r in the 
integral above. 


Finally, suppose k is a stopping time with respect to (Ft, t > 0) and S' is a set of type j 
individuals alive at time k. Then for t > k, let Xj(t) be the number of type j individuals in the 
population at time t that are descended from one of the individuals in the set S, and let B(j(t) 
and Dj(t) the expressions on the right-hand sides of (I4.18|) and (I4.19P with X(f (t) in place of 
Xj(t). Then, the same reasoning used to establish Proposition 14.61 and Corollary 14.71 yields the 
following corollary. 


13 




















Corollary 4.8. Let k be a stopping time, and let S be a set of type j individuals in the population 
at time k. For t > n, let 

Z?(t) = e-£ G W dv Xf(t) - Xf( K ). 

Then (Z?(k + t),t > 0) is a mean zero martingale and 


Vax(Zf( K + t)\F K )=E 




-2 r 

> J K 


Gj (v) dv j 


(Bf (u)Xf (u) + Df (u)Xf (u)) du 


T k 


Furthermore, if t is a stopping time with k < r, then (Zj((k + t) A r),t > 0) is a mean zero 
martingale, and Var (Z?((k + t) A t)\F k ) is obtained by replacing n + t with (n + t) A r in the 
integral above. 


Remark 4.9. By the Strong Markov Property of the population process (X(f), t > 0), the results 
of Corollaries t m and 14.81 hold even when the type j is random, as long as j is J r K -measurable. 


5 Tracing the ancestral lines back to time cln(T — 1) 


The rest of the paper is devoted to the proof of Theorem 12.11 Throughout the proof, we will fix 
e > 0, 5 > 0, t > 0, and T > t + 2. We will also assume that e < 1 and 


5 < max 


1 T-(t + 2) 1 3 1 

100’ 40T ’ 19T’ £ J 


(5.1) 


The event A is defined as in section 4 for these choices of e, 5, and T, and for the constants Ci, 
C 2 , and C 3 from Propositions 14.11 and 14.31 

We sample n individuals at random from the population at time a^T and randomly label 
these individuals with the integers 1 ,,n. We then trace the ancestral lines of these individuals 
back to time ajv(T— (t + 1)). Recall that if 0 < u < t — 1, then IRv('u) is the partition of {1,... , n} 
such that i and j are in the same block of IIjv(u) if and only if the individuals in the sample 
labelled i and j have the same ancestor at time a;sr{T — u ). 

For 1 < i < n and 0 < u < a^T, let Ui(u) be the number of mutations carried by the 
individual at time u that is the ancestor of the individual labelled i at time a^T. For 1 < i < n 
and 1 < j < UifajyT), let 

Vij = inf{;u : Ui(u) = j} (5.2) 

be the time when the jth mutation appears on the zth lineage. For i, j € {1,..., n}, let 

T t .j = sup{u : the ith and jth sampled individuals have the same ancestor at time u} (5.3) 


denote the coalescence time of i and j. 

Throughout the rest of the paper, we use C to denote a positive constant that does not 
depend on <5, e, or T but whose value may change from line to line. Recall that the numbered 
constants Cj, C 2 , and C 3 do depend on J, e, and T. We will say that a statement holds “for 
sufficiently large N ” if there is a positive integer Nq, possibly depending on e, 6 , and T, such 
that the statement holds for all N > Nq. 
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5.1 The types of the individuals sampled at time a^T 

Part 5 of Proposition 14.11 implies that, between times 7 j and 7 j+i, the fraction of individuals in 
the population having type j is very close to one, except for times very close to the boundary of 
this interval. Consequently, when we take a sample from the population at time ajvT, typically 
either all individuals will have the same type, or else all individuals will have one of two types. 
The result below is a weaker form of this statement. 

Lemma 5.1. Let 

L = inf |j : tj > a N (T - 1) - j. (5.4) 

Then 


lim P( A (~l { Ui(aiyT) { L , L + 1,... , L + 9} for some z E {1,...,n}}) =0. 

N-t 00 ' K 11 

Proof. It follows from equation (14.9H that on the event A, we have tl < cln{T — 1) — cln/Ln and 

tl +10 > cln(T — 1 ) + aAr/3/cAr. Therefore, using (14.41) . on A we have 7 l < a^T — a^v/fr/v and 

7 l+io > ajvl + ojv/ 3 & 7 V. Therefore, by part 5 of Proposition 14.11 on A we have 

1 OO 

- ^ X t (a N T) < C 2 e- s ^ L +™- a ^ + 

t=L +10 

which tends to zero as N —>• 00 because (l/3fcjv) log(s/)u) —>• 00 as N —>• 00 by assumption A2, 
and s/(Nfi ) —>• 0 as N —» 00 by ()2.3I) . Likewise, by part 5 of Proposition 14.11 on A we have 

1 / \ —1/^A^ 

— V XAa N T) < C 2 e- s{aNT ~" IL) < C 2 e~ saN / kN = C 2 - , (5.6) 

e=o 

which tends to zero as N —>• 00 . Because the expressions in (15.51) and (15.61) both tend to zero 
as N —>• 00 , we conclude that on A, the fraction of individuals in the population at time a^T 
having between L and L + 9 mutations tends to one as N —>• 00 . Because the n individuals are 
sampled at random from the population, the result follows. □ 

5.2 The types of the ancestors at time cln(T — 1) 

Lemma 15.11 implies that with high probability all individuals sampled at time a^T will have 
between L and L + 9 mutations. Lemma 15.21 below shows that for l E {L,L + l,...,L + 9}, with 
high probability the type t individuals in the sample will all be descended from type t individuals 
at time T£ + \. 

Lemma 5.2. We have 

lim P(An {£/i(Tj 7 i(ajvT)+ i) 7 ^ Ui{a N T) for some i E {1,... ,n}}) = 0. 


s ^ n ( s Y 1/3kN s 

Nfi ~ 2 \fi) + 


(5.5) 


Proof. Choose l E {L , L + 1,..., L + 9}. Recall from Corollary 14. 71 that X^ t+1,aNT (aj\ tT) denotes 
the number of type l individuals at time ajyT that are descended from an individual that got 
its £th mutation during the time interval (r^+i,ajvT]. Equivalently, this is the number of type 
£ individuals at time oatT whose ancestor in the population at time 7£ +1 does not have type 
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i. Because each individual in the population at time a^T has probability n/N of being in the 
sample, we therefore have 


p{ An {Ui(n + 1 ) ^ Ui(a N T ) = t for some *€{!,..., n}} \ F aN T) < 


nX r / +1 ’ aNT (a N T)t A 


N 


■ (5-7) 


It suffices to show that the expected value of the right-hand side of (15.71) tends to zero as N —>• oo. 
By Corollary 14.71 and Remark 14.91 on A, 


~ G ^ V ) dv ' 


X? +1 ' aNT (a N T)= [ aNT nXe-iitfe ^+i Gt{v)dv du +Z? +1 ’ aNT (a N T), (5.8) 


where Zj l+1 ’ aNT (rg+i + t,t > 0) is a mean zero martingale. Note that (|4.9I) implies that on A, 
we have 7 ^_i +k > clnT if N is sufficiently large, and therefore from (14.7|> and from part 4 of 
Proposition 14.31 we get for u E [%i,ojvT], 

^_ l(u)e - ^ G ‘ (w) * < (1 + S)sA G *~' {V) dv e~ ° dv) dV 

= (1 + 6)se$ +1 Gdv) 

< 2(1 + d)s 2 e _ s(u _ T£ )^ 




It follows that on A, if N is sufficiently large, 

ra N T 


r ''m c<( ” 1 * du < 2(1+15)5 

J T£_(_l M 


Now on A, by (14.91) . we have 


g—sfa+i—Tjg) < e ~a N s/3k N _ / _£ 


-l/3fejv 


Also, on A we have oatT E [rtf+i, 7f+A'] if AT is sufficiently large and therefore, by (|4.7j) . 


(5.9) 


(5.10) 


(5.11) 


Combining (15.81) . (15.91) . (15.101) . and () 5.11 j) . and using that X^a^T) < N, we get that for suffi¬ 
ciently large N, 


E 


X , 


T£ + i,ajvT 


(aAfT)lA 


TV 


< E 


< E 




_ r a N T GAv) dv , x 
e 7(e + 1 Xi(cinT) 


2(1 + ^(a/ju) 1 - 173 ^ + (ajvT) 


(1 - 6)(s/fi) 
_ 2(1 + S) ~ 1/3kN 
1-5 V/x y 


(5.12) 


Because (1/3/tv) log(s///) -> oo as A ->oo by assumption A2, the expression on the right-hand 
side of (15.121) tends to zero as IV —>• oo. The lemma follows by taking expectations of both sides 
in (15.71) . □ 
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5.3 Coalescence between times ajv(T — 1) and a^T 

Our next goal is to show that for t E { L ,L + l,...,L + 9}, the type i individuals in the sample 
at time ajyT all come from distinct ancestors at time r^\. That is, the lineages do not coalesce 
as they are traced back from time a^T to time ti + The precise statement is given in Lemma 
15.41 below. Because 7^1 — Tp + 1 = aw, this observation is very close to the statement (12.5p that 
none of the lineages coalesce when they are traced back a at time units. We first establish the 
following preliminary lemma, which is more general than what is needed for the proof of Lemma 
IQ but will also be used later to prove Lemma 16.61 


Lemma 5.3. Suppose k* + l + K<j<J. Randomly label the type j individuals at time tj + 1 
by the integers 1,2,..., | s//i\. For t > Tj + let X l -(t) denote the number of type j individuals at 
time t that are descended from the individual labelled i at time t ]+ \. Let 7 = Jj+K A f A ajyT, 
and let 


Rij = sup 


xm 


te[T j+ 1 , 7 ) Xj(t ) 


(5.13) 


Then 


E 


\s/F\ 

E R h 

Z— 1 


Cp_ 

s 2 kx 


(5.14) 


Proof. By Corollary 14.81 applied when S consists only of the individual labelled i at time Tj+i, 
for i = 1 , 2 ,..., [s///| and t > Tj+i, we have 


X](t A 7 ) 


s;Z Gw dv 


= e A+i 


a + mt)), 


(5.15) 


where + t),t > 0 ) 

and (14.71) . 


is a mean zero martingale. Now suppose t E [ 7 ^+ 1 , 7 ). Using (|5.15p 


X)(t) 
Xj(t ) 


< 


A 


(1 — 5) 2 s 2 


{l + Z){t)f 


Taking the supremum of both sides over t € [rj- 1 - 1 , 7 ), then taking expectations and using that 
(a + b ) 2 < 2 a 2 + 2 b 2 , we get 


mu < -jj 


2 /J , 2 


— 6) 2 s 2 


1 + E 


sup ( Z){t )) 2 
t€[Tj+ 1 , 7 ) 


(5.16) 


By the L 2 Maximum Inequality for martingales, Corollary 14.81 and the reasoning used to derive 

62QD, 


E 

sup {Z){t)f 

T 

■ rr 3 +1 

< 4 E 

r Gj{v)dv „ ,, 

/ e G+i n ■ 3X!j (u) du 

T 

■ r Tj +1 


-te[Tj+ 1 , 7 ) 



■ •'Tj+i 

- 


Combining this result with (| 5.15 f) gives 


E 

sup ( Z){t )) 2 

T 

^ 7+1 

< 12 E 

f'V — /“ Gj(v) dv . .. 

/ e A+i (1 + ZAu)) du 

T 

•'Tj + l 


.te[T j+ 1 ,'y) 



- •''7+1 
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Note that 1 + Z l -{u) > 0 for all u G [tj + 1 , 7 ) by (15.151) . Therefore, by part 5 of Proposition 14.31 
and the fact that +t),t > 0 ) is a mean zero martingale, 


E 


/ 

J T. 


_ p G (v)dv 

e h+i Av)dv (l + Z){u))du 

7+1 

OO 

< e' 


T 

^ 7+1 


roo 

/ e -^(«-T,-+l)/5(i + Z)(u))du 

J Tq -LI 


L 


U J 7+1 
00 


j 1 


7 + 1 


-sk N (u-T j+1 )/5 


7 +1 

5 


skw 


(5.18) 


Also, using part 5 of Proposition 14.31 again and the fact that 7 — 'Jj-K A 7 < ( 2 on/ k]y)( 2 K ) < a at 
for sufficiently large A" by (J4.9I) . 




A — /“ Gj(v) dv , j. .. 

/ e J + 1 (1 + Zj(u)) du 


T 

J n 


n 


< E 


< CIN — 


—fcjv/241 


7+1 
-fcjv/241 


(1 + zf(u)) du 


T 


7+i 


(5.19) 


Because s&ai • ai\r(s/fj,) fcjv / 241 0 as A — >• 00, as can easily be seen by taking logarithms, 
equations (15.171) . (I5.18D . and (15.191) imply that 


E 

sup {Z){t )) 2 

T 

+ 7+1 


-*e[rj +1 ,7) 



5 

sk AT 


+ 0 ,N 



-k N / 241 


c 

skjy 


for sufficiently large A. Therefore, using (I5.16|) . we get for sufficiently large A, 


E 


\s/E\ 

E 

L Z=1 


A 


1,3 


< 




2/r 2 


(l-5) 2 s 2 


1 + 


C 

s/cat 


The result follows because skw —>• 0 as A —>• 00 by assumption A3. □ 

Note that in the statement of Lemma 15.41 below, we consider only the lineages labelled 1 and 
2 to simplify notation. This is sufficient because individuals are sampled uniformly at random. 
To bound the probability that the event in question occurs for some pair of lineages, we may 
simply multiply the probability that the event occurs for the lineages 1 and 2 by (!)). 

Lemma 5.4. We have 


lim P (A n {U\{a N T) = [^(oatT) = t and 7 ),2 > r ^ +1 for some A}) = 0. 

N —>00 
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Proof. We know from Lemma 15.21 that with probability tending to one as N —>• oo, on A all type 
l individuals sampled at time a^T have type £ ancestors at time t £+ \. Therefore, it suffices to 
show that 


lim 

TV—>-00 


P(An{U 1 (a N T) = U 2 (a N T) = C/i(r m ) = U 2 (r £+1 ) = £ 


and T\ 2 > t £ .|_i for some £}) = 0. 


(5.20) 


That is, we need to show it is unlikely that the first two individuals in the sample are both type 
£ individuals that are descended from the same type £ individual at time t^+i- 

Randomly label the type £ individuals at time t £+ \ by the integers 1, 2,... , \s/fx] . Let X\(t) 
denote the number of type £ individuals at time t descended from the zth type £ individual in the 
population at time t £+ \. Since each individual at time a^T is equally likely to be sampled, 

p (A n {U^onT) = U 2 (a N T) = U 1 (T j+1 ) = U 2 (r j+1 ) = £} n {T 1>2 > r £+l }\X aNT ) 

_ ^ X l £ {a N T)(X}(a N T) - 1)1 A 

h N(N-l) • ( • ) 


By Lemma 15.11 it suffices to consider £ € {L,L + 1,L + 9}. Part 6 of Proposition 14.11 
implies that on A, we have Tk*+i+K < 2ajv(iP+l)/A:jv, and therefore L > k*+l+K for sufficiently 
large N. Also, in view of (14.9]) , on A we have ^l+v+k > a^T. Therefore, applying Lemma 15.31 
and noting that the probability of a change in the population at exactly time a^T is zero, for 
each fixed positive integer £ we have 


E 


\s/tA 

£ 

i=l 


X\(a,NT)(Xl(aNT) - l)l{L<£<L+ 9 }nA 

N(N - 1) 



r r»/Ml -1 

< E 

T. R l> 

- i— 1 


< 


Cfi 

s 2 k 


N 


(5.22) 


Taking expectations of both sides of (|5.21l) and then using (|5.22l) and the fact that L + 9 < J on 
A by Remark 14.21 we get that the probability in (15.2011 is bounded above by CJfj,/(s 2 k]\r), which 
tends to zero as N —>• 00 by (12.311 . Thus, (15.20(1 holds, which implies the result of the lemma. □ 


Remark 5.5. It follows from Lemmas l5.1l and l5.2l that with probability tending to one as A^ —>• 00 , 
we have Ui{tl+ 10 ) = C/^ajvT) for all i € {1,..., n}. Because individuals in the population model 
inherit all of their parents mutations, two lineages can only coalesce if they have the same type. 
That is, we must have UjfTj j) = Uj (Tj j) for i,j € {1,... ,n}. It therefore follows from Lemma 
E3I that with probability tending to one as N —> 00 , no lineages coalesce as they are traced back 
from time a^T to time tl+ io- The fact that the probability of coalescence between times tj j and 
7x + io tends to zero as N —>• 00 , which would imply (12.51) . will be established later. 


6 Tracing the ancestral lines between times rj and t j+ \ 

Lemmas 15.21 and 15.41 show that the type £ individuals in the sample at time a^T are typically 
descended from distinct type £ ancestors at time t £+ \. In this subsection, we consider tracing 
these ancestral lines back further in time. In particular, we focus on what happens when lineages 
are traced back from time Tj + \ to Tj. We establish that with high probability, type j individuals 
at time Tj+ 1 are descended from type j — 1 individuals at time Tj. and lineages will only coalesce 
when many type j lineages are traced back to an individual that acquired its jth mutation before 
the time defined in (14.31) . 
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6.1 Approximating Tj by the fixed time r* 


We define here some fixed times r* that approximate the random times Tj. Let rj !* +1 = 0. For 
integers j > k* + 1 , let 

* * | tc 

j+1 j k N q{T*/a N )' 

where q is the function defined in Proposition 14.131 Because 1 < q(u) < e for all u > 0 by 
Proposition 14.51 we have 

n at- n tkt 

( 6 . 2 ) 


_ T * < 

efejv — ,?+1 ~ kjy' 


For u G (0,T], let j*(u) = max{j : r* < ciArtt} and /(u) = max {7 : 77 < a^u}. The lemma below 
shows that r* is a good approximation to Tj. 

Lemma 6.1. Fix u G (0,T]. On the event {C > clnu}, we have 

I j*(u) ~j'{u) | < 96Tk N . 

Likewise, let j G { k * + 1,..., J}. On the event {Tj < ( A ajyTj, we have 


|Tj — Tj | < lOdaArT. 


(6.3) 

(6.4) 


Proof. Suppose j G {&* + 1,... , J} and 77 +1 < ( A oatT. By part 6 of Proposition 14. 11 


1 _ 25 /' T j+i/ a N 

kjy 


rr j+1 /a N t- 

/ l{uG[l,7fc* + i/ajv)} du< 

Tj j CL ' 


T j + l/ a N 


1 + 25 


g(rt) du < 

Tj /a N k N 


(6.5) 


Therefore, if u € (0, T] and ( > onu, then, using that Tk*+i/aN < 2/fcjv and u—Tjii u )/aN < 2/&jv 
by part 6 of Proposition 14.II and that < 7 ( 7 ;) < e for all v € [0, u] by Proposition 14.51 we have 


r-U r T k* + l/ a N 

/ q(v) dv< q(u) du + 

Jo Jo 


(1 + 25)(j\u) — (k* + 1)) 


k N 


/*IL 

L 


(u)/ a N 


q(v) dv 


< (l + 2<?)(j»-(fc* + l)) | 4e 


( 6 . 6 ) 


Likewise, using that r )k*+i/ < iN — 1 < 2/kjy by part 6 of Proposition l4.ll the lower bound in (16.51) 
implies that 

/%«*-> (1 ^ 2W( “ ) - (r + 1)) 2 
Jo 


kN 


kN 


By definition, 


l 


r j+\/ a N 


Q 




r*/a N -WJ k N 

By (14.161) and (16.21) . if u G [r*/a a?, t* +x /on) and r* +1 /a at < C> then 


(6.7) 

( 6 . 8 ) 


/ T* 

g(u) - q — 
V on 


< e ( r /+i ~ r i ) < J_ 
Oat “ fcjV 


unless 1 G (T?/ajv,7t]. Combining this observation with (16.81) and (16.21) . we get 


k N k 2 N ) 1{m ^ /aN ^ /aN]} ~ 


l 


rf +1 / aN i e 

q[u) du < - -h 72 “. 

/a N k N k z N 


(6.9) 
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Now (16.21) and (16.91) imply that 


i-u rTj*( u )/a N r 

/ q(v) dv= q(v) dv + 

JO JTu*\-i/cLN J t. 


T k*+l/ a N 


‘(t i)/ a N 


q(v) dv 


< (l + e/k N ){j*{u)~ (fc* + l)) | e 

k N k N 


and 


ru 

/ g(v) 

Jo 


dv > 


(1 — e/k N )(j*(u) — (fc* + 1) — 1) 

k N 


Combining (I6.7[) and (16.101) gives, for sufficiently large N , 

(1 + e/k N )(j*(u)-(k* + l)) , e + 2 


/(«) - (k* + 1 ) < 


1-2 5 1-2 6 

< (l + 36)(j*(u)-(k* + l))+5. 


( 6 . 10 ) 

( 6 . 11 ) 


Rearranging this expression, and using that j*(u) — (k* + 1) < (a]yu)(ekN / oln) < eTkjy by (16.21) . 
we get for sufficiently large N, 


f(u) - j*(u ) < 3 6(j*(u) - (k* + 1)) + 5 < 9 5Tk N . 
Likewise, combining (16.61) and (16.111) . we get for sufficiently large N, 

(1 — e/k N ){j*{u) — (k* + 1) — 1) 4e 


j\u) - (k* + 1) > 


1 + 2 5 1 + 25 

> (l-3S)(j*(u)-(k* + l))-(4e + l). 

Rearranging, and again using that j*(u) — (. k* + 1) < eTkjy, we get 

j'{u)-j*{u) > -3 5(j*{u) - (. k* + 1)) - (4e + 1) > -95Tk N . 


( 6 . 12 ) 


(6.13) 


The result (16.31) follows from (|6.12l) and (16.131) . 

Finally, to prove (|6.4I) . note that on the event {rj < C A ajyT}, we have j*(r*/a/v) = J and 
j'(jj/aN) = j■ Therefore, using (16.31) . we have 

|j*(r 7 /o A r) - j*{j*/a N )\ = \j*(jj/a N ) - j'{rj/a N )\ < 95Tk N . 

Since |j*(r ? /aAr) — j*{r* /a/v)| is the number of points r* that land between Tj and r*. it now 
follows from (16.21) that for sufficiently large N, 

I Tj -t* I < (957^ + 1) ~ < IO^utvT, 

J kat 

which matches m- □ 


Define the hxed positive integers 


3i= j*(T-(t + l))~ [98Tk N \, 


32 — j* {T — 1 + 19 /k N ) + [95Tkj ^\, 


and let 

I = {j G N : ji < j < j 2 }. 

The next result shows that, when tracing ancestral lines back from time a]\r(T — 1) to time 
ajy(T — (t + 1)), we only need to consider time intervals [Tj, Tj + \\ for j € /. 
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Lemma 6.2. On the event A, for sufficiently large N, we have 

ajsr(T — (t + 1)) — 10<5ajvT < rj) < r* 2 < on(T — 1) + 10 8onT (6.14) 

and 

aN + w— < Tj 1 < aAr(T - (i + 1)). (6.15) 

Kn 

Also, L + 9 < j '2 < J and Tj 2+ \ < ajyT. Furthermore, the cardinality of I is at most 3 Tk^- 

Proof. Throughout the proof, we will work on the event A. Using (16.21) . we get that for sufficiently 
large N, 


Tj > aw{T — (t + 1)) — (9 5Tkw + 1) • —^— > cja t(T — (t + 1)) — 10 da^T 

k N 

and 

t* 2 <a N (r- 1 + ^) + (9 5Tk N ) ■ < a N (T - 1) + 10 5a N T. 

We have now proved (|6.14l) . 

By (16.31) . we have j\ < j'(T — (t + 1)), and thus Tj 1 < on(T — (t + 1)), which is the upper 
bound in (16.151) . To get the lower bound, note that (16.41) and (16.21) give 

r h > t* - 10 5a N T > T*, (T _ (t+1)) - (9 5Tk N ) • ^ - 10 6a N T. 

Since (16.21) implies T J*( v j > ajyu — a/v /kjy for u € (0,T], it follows, using (15.11) . that for sufficiently 
large N, 

r n > ajv(T — (t + 1)) — 7^- — 196onT 

kN 

> on T — (i 4 2) — 20 5T") 

>a N + a N {^- (6.16) 

The lower bound in ()6. 151) follows because limjv->-oo kN = 00 . 

Next, note that by (|4.9j) and (|5.4j) . we have r^+io < cln{T — 1) + 19aN/kN • By (16.31) . we 
have j '2 > j’(T — 1 + 19/fcjv), which means Tj 2+ \ > unIT — 1 + 19//cjv) > tl +10 and thus 
j '2 > L + 9. Also, by (16.21) . the number of times r* between on{T — 1 + 19/fcjv) and a^T is at 
least (fcjv/<Uv)(ajv(l — 19 //cjv)) — 1 = — 20. Therefore, using (16.31) . 

j'(T) > j*(T) — 95TkN P j*(T — 1 + 19//cjv) + — 20 — 95TkN > J 2 + fc/v — 20 — 18 5TkN, 

which is greater than 72 + 1 for sufficiently large IV because <5 < 1/19T by (15.11) . It follows that 
U 2+1 < cinT. 

Finally, by Remark 14. 2 1 we have j '2 +1 < J. Also, we have and j\ > k* +1 for sufficiently large 
N by (|6.16l) . so j '2 — ji + 1 < 3TkN, which is equivalent to the last statement of the lemma. □ 


22 













6.2 The types of the ancestors at time t 3 

Lemma 16.41 below establishes that with high probability, the type i individuals in the sample get 
traced back to type £ — 1 individuals at time ti, then to type £ — 2 individuals at time t^!, and 
so on until we have traced the lineages back to time a^(T — (t + 1)). We begin with the following 
preliminary result. 

Lemma 6.3. Let j G I ■ Let Kj be the number of type j individuals in the population at time 
Tj -|_i whose ancestor in the population at time tj does not have type j — 1. Then 

l-l/3fejv 


E i K j t {T j+ i<C}] ^ 5 (^) 


Proof. By parts 1 and 6 of Proposition 14.11 on {tj + \ < C}, no individual of type j or higher in 
the population at time Tj has a descendant alive in the population at time tj + i. Therefore, Kj is 
the number of type j individuals at time Tj + \ whose ancestor at time Tj has type less than j — 1. 
Such an individual must be descended from an individual that gets its (j — l)st mutation after 
time Tj. We will therefore consider the number of type j — 1 individuals at times t > Tj that 
are descended from individuals that acquired their (j — l)st mutation between times Tj and t 3+ \. 
Following Corollary 14.71 we denote the number of such individuals by -X’ -i.’ 1 J ' +1 (t). Then, writing 

Cj = C A Tj- 1 - 1 , 


_ r' ‘j 

e 


Gj ~ l(v) dv X T L'l j+1 ((Tj + u) A Cj) 


L 


(Tj+u)A(j 


pXj-2(w)e r i 


~f™ G 3-1 (”)* j... , ^TjTj+l 


dw + Z-Cl (( Tj + u) A Cj), 


(6.17) 


where {ZP 3+1 (rj + u),u > 0) is a mean zero martingale. By (14.7[) . on the event {Cj > Tj}, we 
have for t > 0, 


L 


(Tj+u) A(j 


jiXj_ 2 (w)e Tj 


- If■ G i -iW dv 


du < 


L 


(tj+u) AQ 


(1 +6)se fT j _ 1 G ^ d --fr j G j - 1 (v) d V 


dw 


= (1 + 5 )se f ^ Gj ~ 2{v)dv 


- (Tj+u) AC j 


e s(w~Tj) ^ 


< (1 + 6)e$-i G i- 2 ^ dv . 

By (14.91) and part 4 of Proposition 14.31 on {Cj > Tj} we have 

e frf-i Gj- 2 (v) dv _ g-s(Tj— Gj-i(v) dv < e -s(a N /3k N ) ( 

V h 


< 21 - 

(i 


1-1/3/cjv 


(6.18) 


(6.19) 


Taking conditional expectations on both sides of (16.171) and then using ()6.181) and (16.191) gives 

E[ e -fr? +U)M ' Gj-iM dv X^l j+1 ((Tj + u) A C)| Jvj < 2(1 + 5) ^ N . (6.20) 

For u>Tj, let X* (u) denote the the number of type j individuals in the population at time 
u that got their (j — l)st mutation after time Tj. Note that Kj = X*(t 3+ \) on {rj + i < C}- By 
the reasoning that leads to Corollary 14.71 we get 


- Gj(v) dv f U Y Tj,Tj +1( \ — ff. Gj (v) dv *, - v 

e J j 3 ' Xj (u A Cj) = / fiXjff ( w)e j dw + Zj(u A Cj), 

J Tj 


( 6 . 21 ) 
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where (Z*(Tj +3 +u),u > 0) is a mean zero martingale. By part 4 of Proposition fOl on {rj+i < £}, 
we have 

e /^ +1 Gj{y) dv < 2s _ 

’ 

Therefore, using that the expression in (16.211) is nonnegative, 


V*(T i+ i)l { , +1<a <!w + z;(0)). 

Taking conditional expectations of both sides and using Fubini’s Theorem and (16.201) . 


2s / r°° / o \ i — i/3fcjv 

£[X;(r j+1 )l {TJ+1<c) |^ TJ ]<-^j( 2(1 + <5 )m(-) e-^lldw 


= 4(l+<5)( - 


l-l/3k N 


Taking expectations of both sides gives the result of the lemma. 


□ 


Lemma 6.4. We have 

lim P(A fl {Ui(rj) 7 ^ j — 1 /or some i € {1,..., n} and j € I with j < £/i(ajvT)}) = 0. 

N—>oo 

Proof. Fix i € (1,... , n}. Suppose A occurs and Ui(jj ) / j — 1 for some j & I with j < Ui{ajs[T). 
Then either Ui(T Ui ( aN T)+ 1 ) / J7i(ajvT), an event whose probability tends to zero as iV —>• oo by 
Lemma 15.21 or else there is an integer j G I with j < Ui(aNT ) such that Ui(jj ) ^ j — 1 and 
Ui(jj + \) = j. Therefore, to prove the lemma, it suffices to show that 

lim^ P(A n (C/ifo) ±j- 1} n {E7 i (r i+1 ) = j}) = 0. ( 6 . 22 ) 

Fix j € I. Recall from Lemma 16.31 that Kj is the number of type j individuals in the 
population at time Tj+i whose ancestor in the population at time Tj does not have type j — 1. 
Note that the probability, conditional on J ~ Tj+x , that a randomly chosen type j individual at time 
Tj .|_i is not descended from a type j — 1 individual at time Tj is Rj/fs//i]. Also, conditional on 
J r Tj+1 , the \s/fa] type j individuals at time t 3+ \ are equally likely to be the ancestor of the ith 
individual in the sample taken at time ajsrT. Therefore, since Kj is J ~ Tj+1 -measurable, on the 
event tj + ± f we have 

P{{Ui(Tj) + j - 1} n mr j+1 ) = j}\F Tj+1 ) = P(Ui( Tj+1 ) = j\P Tj+1 ) • 

Therefore, multiplying both sides by 1{ T . +1 <£}, taking expectations, and using Lemma [6751 we 
get 

/ \ —1/3 kN 

P({Ui{r 3 ) +j- 1} n {Ui(T j+1 ) = j} n {T j+1 < C}) < ^E[Kjt {Tj+1<c} ] < 5(JJ 

Since the cardinality of I is at most 3 Tk^ by Lemma 16.21 it follows that the sum of the proba¬ 
bilities on the left-hand side of (16.22[) is at most 

/ „ \ —l/3fcjv 

VSTk N [-\ 
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To check that this expression goes to zero as N —>• oo, we consider the logarithm. Note that 
log(A:Ar(s//r)” 1/,3fcjV ) = log & 7 v — (l/3fcjv) log(s///), which tends to -oo as IV -> oo by assumption 
A2. In view of the discussion before equation (16.221) . the result of the lemma follows. □ 

6.3 Coalescence between times t 3 and t 3+ \ 

We next consider the merging of ancestral lines between times Tj and Tj + It will suffice to 
consider the lineages labelled 1 and 2. In view of Lemma 16.41 we may also assume these lineages 
have type j at time Tj+\ and type j — 1 at time tj, which will occur with high probability. 
Recall the definitions of Vjj and Tjj from (15.2)1 and (15.31) . Also, let Vj = minjVij, and 
V* = maxjVij, V r 2 ,j }. Because only lineages of the same type can coalesce, there are only three 
ways that these lineages could coalesce between times Tj_i and tj: 

1. Two lineages at time Tj+\ could be traced back to one individual that acquires its jth 
mutation between times tj and Tj + \. That is, tj < V±j = V 2 j < T\ ^ < Tj + 

2. Two lineages at time t 3+ \ could be traced back to one individual that acquires its jth 
mutation before time tj- That is, Tj < V\j = Vzj < tj and = V 2 J < T\p < T j+i- 

3. Two lineages at time Tj + \ could be descended from different type j mutations between 
times Tj and Tj + 1 , but then the two type j — 1 lineages could coalesce before time Tj. That 
is, Tj < T\ 2 < Vj < Vj < Tj + 1 . 

We will now show that only coalescence events of the second type need to be considered. Lemma 
16.51 rules out case 1 above, and Lemma [R6l rules out case 3. 

Lemma 6.5. Define the event 

A'j = {U 1 (r j+1 ) = u 2 (T j+ 1) = j} n {ih{Tj) = U 2 {rj ) = j - 1} n {tj < Vj = v 2J < t 1)2 < Tj+1 }. 

For sufficiently large N, we have 

P(An U A'j J < CTe~ b . 

^ jei ' 

Proof. Fix j E I. Let Hj be the number of type j mutations between times tj an d T~j+\ ■ Let 
0 < «q < K 2 < • • • < kH j denote the times at which these mutations occur. Let Xj t 2 ,i(u) be 
the number of type j individuals at time u descended from the individual that acquires its jth 
mutation at time This means that 


Hj 

Xj,2{r j+ x) = yXj^{Tj + i). 

2—1 

Conditional on Xj i 2 ,\{jj+ 1 ),... ,Xj i 2 ,H j {jj + 1 ), the probability that two randomly chosen individ¬ 
uals at time Tj + \ are descended from the same individual that gets its jth mutation between 
times tj an d Tj + 1 is 


1 

\s/iA{\s/tA 


ii j ^ tj j 

Xj t 2,i{T j+ l)(Xj,2,i(T j+1 ) - 1 ) < ^yXj^iiTj+j) 2 . 

n - 1 n - 1 
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Since, conditional on J ~ r , l+1 , each of the \s/fi\ type j individuals at time Tj + \ is equally likely to 
be the ancestor of an individual in our sample at time aArT, it follows that on < (, 


2 Ti 


< L-Y, x iMrj+ 1 ) 2 - 


(6.23) 


i= 1 


Therefore, multiplying both sides by 1{ T ^ +1< ^| and taking expectations, 


P(i'n{r 3+1 <(})<^ 


Hi 


1) 2 1 


i= 1 


{ r j+i<C} 


(6.24) 


We now bound the expectation on the right-hand side of (16.241) . Write Cj = C A Tj + By 
Corollary 14.81 applied with Kj playing the role of k and the single type j individual that acquires 
its jth mutation at time Ki playing the role of S, we get 


X j>2 ,i(u A 0) = G M dv (l + Z id (u)), 


(6.25) 


where (Zi j(K l + u), u > 0) is a mean zero martingale. Therefore, using part 4 of Proposition 14.31 
we get that on { k i < tj + \ < £}, 


v ( \_ / t 7 +1 Gj(y) dv -f** G i(v)dv , ^ 

x j,2,iYj+i) — e J c J (IT "i,jvj+i)) 

2s — Gj(v) dv . 

T c J (IT 

k 

Corollary 14.81 combined with (14.201) and (16.251) gives that on {ui < Tj + 1}, 

r T i+ iaCj 


(6.26) 


Var(Z ij (r i+ i)|J r Ki ) < 3£ 


= 3 E 


-2f u Gi(v)dv v , s , 

e jK i ' Xj' 2 ,i(u) du 


L 

[Tj+iKj _n G r v \ dv 
/ e Jn i (1 + Zij(u)) du 

J K,i 


T, 


Ki 


Because k* +1 < j < J by Lemma [6.21 it follows from part 2 of Proposition l4.3l that for sufficiently 
large N, if v € [Tj,Tj + 1] and v < (, then Gj(v ) > (1 — 25)sfc/v ■ Therefore, on {k^ < Tj + \}. 


Var(Zij(r J+ i)|J' Ki ) < 3 E 


poo 

/ e skN(l-2S)(u- Ki ) ^ + Zi j(u)) du 
- J Ki 


Ki 


s / cjv ( 1 — 25) 


(6.27) 


From (|6.26l) and (16.271) . we get that on {K t < Tj+ 1}, 


4s 2 —2 Gj (v) dv 


E\ X j,2,i( T j+l)^-{T j+1 <Q\FK.i\ < 2 e 

r 


IT 


sk]\r(l — 25) 


By assumption A3, the second term inside the parentheses dominates when N is large. Also, by 
part 1 of Proposition 14.31 we have s(qj — C3) < Gj(v ) < s(qj + C3) if Tj < v < Q. Therefore, for 
sufficiently large N, on (k ? ; < Tj+i}, 


Cs 


E[X] xi ( T, + ,)l( w «>h«<] < 


(6.28) 
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Next, we condition on F Tj . Using (14.71) followed by part 1 of Proposition 14.31 the rate at 
which type j mutations are appearing at time u, provided that Tj < u < Q, is 

fiXj-! (it) < (1 + 6)se^ dV < (1 + 5)se s ^ +C a-l)(«-^). (6.29) 


Therefore, using (16.281) . (16.291) . and (14.91) . we get that on {t 3 < £}, 


E 


Hi 


T 

S-Tj 


H X jMwf ) 1 {r,+i<C} 

3 = 1 7 

r(Tj+2a N /k N )/\( si 

(1 + 5)se s fe+ c 3-i)(«-^) . __ e -Mq j -c 3 )( u ~T j ) du 


< 


H 2 k N 


Hq 2 r(Tj+2a N /k N )/\( 

< / e - sfe -3C 3 +l)(«-T,.) ^ 


< 


Cs 


fJ. 2 k N (qj - 3C 3 + 1) 


. p-s(?j-3C'3+l)(&-T # ) 


(6.30) 


Note that e TjS> = sqje b for sufficiently large iV by (14.31) . Also, q 3 > (1 — 2S)kN on {Tj < C} 

for sufficiently large N by part 3 of Proposition 14.31 so 


~ Tj) = i; ( log (4) + ") - (i - hn, ( :108 G) + h ) - 0 (6 - 31) 

as N —>• oo by assumption Al. Therefore, e ( 3Cr 3 —iMCj-t,-) —>■ 1 as AT —>• oo. Combining these 
observations with (|6.30l) . and using that r 3+ \ < Q implies Tj < Q in view of (14.91) . we get that for 
sufficiently large N, 



■^{ T j+i<C} 



Cs 2 e~ b 

H 2 k N 


(6.32) 


Finally, we can take expectations of both sides in ()6.32l) and combine the result with (|6.24l) and 
the fact that the cardinality of / is at most 3 k^T by Lemma 16.21 to obtain the result of the 
lemma. □ 


Lemma 6.6. Recall that Vj = min{Uij, V 2 j}- Define the event 


A*j = {Uiirj) = U 2 (r J ) = j - 1} D {tj < T 1j2 < Vj < r j+1 }. 


Then 



= 0 . 


Proof. Fix j € I. Randomly label the type j — 1 individuals at time Tj by 1, 2,... , \s/fi \ . For 
t > Tj, let Aj_ x (t) denote the number of type j — 1 individuals at time t descended from the type 
j — 1 individual labelled i at time tj. 

Let Cj be the a-field generated by the random variables V \ d , V' 2 J j,Xj_ 1 (Vj —),..., X^ S J_^ (Vj —) 
and the event {Vj < £}. The only way that A* can occur is if the first two lineages get traced 
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back to distinct type j — 1 ancestors at time Vj— and then merge between times Tj and Vj—. 
Conditional on Cj , we know that one of the type j — 1 individuals at time Vj — will get a jth 
mutation at time Vj, but all of the type j — 1 individuals at time Vj— are equally likely to be 
ancestors of individuals in our sample at time ajyT. Therefore, using the notation from (15.13[) . 


P(A*t {Vj<c} \Cj) < 


f x t j_ 1 (v j —)(x t j _ 1 {y j —) — i) 

Vfe Xj^Vj-KXj-iiVj-)- 1) 


1 {v j <0 ^ 


E R h- 1- 

i=1 


It follows from part 6 of Proposition 14.11 that t^* + k+ i < {K + l)(2ajv /k]\r) < ajv for sufficiently 
large N, and therefore by (16.151) . we have j\ — 1 > k* + 1 + K. Thus, summing over j E /, taking 
expectations of both sides, and applying Lemma [-7731 and Lemma HOI we get 


E p (4‘ n{r, <C})<E 

jei j el 


C-n 

s 2 k]\r 


< 


C[iT 


(6.33) 


The right-hand side of (16.331) tends to zero as N —>• oo by (12.31) . The result of the lemma follows 
because if A n A* occurs for some j E I, then Vj < Tj + \ < ( by Lemma [6721 □ 


7 Coupling with a branching process between times tj and r 7+ i 

Recall from Lemmas 16.51 and 16.61 and the discussion before Lemma 16.51 that we have shown that 
all possible coalescence events have low probability, except for the possibility that type j lineages 
at time r^+i could be descended from the same type j mutation between times Tj and £j. In this 
section, we study these early type j mutations in depth. The strategy here will to couple the 
descendants of these mutations with a supercritical branching process. 


7.1 Review of results on continuous-time branching processes 

Consider a continuous-time birth and death process ( Z(t),t > 0) in which each individual inde¬ 
pendently dies at rate v > 0 and gives birth to a new individual at rate A > v. Assume Z( 0) = 1. 
Using results in [I], one can show that 

> 0 ) = <™> 


which is also stated as part of Lemma 8.16 of m ■ Let q denote the probability that the population 
goes extinct by time t. By letting t —t oo in (17.11) . we get 


i -q = 


A — v 
A 


(7.2) 


Let W(t) = e v ') t Z(t ). It is well-known (see, for example, section 7 of Chapter III in HI) 

that ( W(t),t > 0) is a martingale, and there is a random variable W such that 


lim W(t) = W a.s., (7.3) 

£—>■00 

where W is zero on the event that the branching process goes extinct and is almost surely strictly 
positive on the event that the branching process survives forever. In this instance, it is also known 
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that the conditional distribution of W given that the branching process survives forever is the 
exponential distribution with rate parameter 1 — q, so that if x > 0 , then 

P(W >x) = {l- q)e- (1 -^ x . (7.4) 


This can be derived from results in p] and is also worked out, for example, in 1141 . Recall that if 
S has an exponential distribution with parameter A, then E/S 1 ] = 1/A and E[S 2 ] = 2 /A 2 . Because 
P(W > 0) = 1 - q, it follows that E[W] = 1 and Var(W) < E[W 2 } = 2/(1 - q). We will need 
the following result concerning the rate of convergence of W(t) to W. 


Lemma 7.1. For all rj > 0 and t > 0, we have 


P(\W(t) -W\>r])< 


2e~^ x ~ u l t 
ri 2 { 1 - q)' 


Proof. Conditional on Z(t), we can consider separately the descendants of the Z(t) individuals 
at time t to see that 

z(t) 

W = e- {x - u)t ^2Wi, 

2=1 

where the random variables ... , Wz(t) are independent and have the same distribution as W 
(see section 10 in Chapter III of ED)- It follows that 

E[W\Z{t)} = e- {x ~ v)t Z{t)E[W] = W{t) 


and 


V&r(W\Z(t)) = e- 2{x ~ u)t Z{t)\ai{W) < 


2 e-^-^Wft) 

~i 


Therefore, by Chebyshev’s Inequality, 


P(\W-W(t)\ > ? 7 | Z(t)) < 


Vai(W\Z(t)) 


2e - {\-v)t W (t) 

~ rj 2 {l-q) 


Because E[W(t)\ = 1, the result follows by taking expectations of both sides in (17.51) . 


(7.5) 

□ 


7.2 A branching process coupling between times Tj and Tj + \ 

We will assume now that j E I, which by Lemma 16.21 ensures that tj+i < ajyT on A. Recalling 
Corollary 14.71 we will let 

denote the number of type j individuals at time t that are descended from individuals that 
acquired a jth mutation during the time interval ( t .,-, C/. Note that = X h \ (t), as long as 

there are no type j mutations before time Tj. We say there is a pure birth event at time t if 
Xj(t) = Xj(t-) + 1 and a pure death event at time t if X'-{t) = X'-(t—) — 1. We say there is 
a birth and death event at time t if one of the X'-{t—) individuals at time t— gives birth and 
another dies, so that X'-(t) = Xl(t—). Let Bj(t) and D'-(t) denote the expressions in (14.181) and 
(14.191) respectively with X'-{t) in place of Xj(t). Recall from the discussion surrounding (I4.17p . 
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(I4.18p . and (14.191) that if t € [t.,-, Tj + \ A £), then the rate at which a particular type j individual 
gives birth as part of a pure birth event is 


B'(t) =(l- (1 + s(j - M(i))), (7.6) 

while the rate at which a particular type j individual is involved in a pure death event is 

D'(t) = 1 + /x - ^(1 + s(j - M(t))). (7.7) 

Also, the rate at which a particular type j individual gives birth as part of a birth and death 
event and the rate at which a particular type j individual dies as part of a birth and death event 
are both equal to 

= + (7.8) 

We write B*(t) = B’- (t) + () 3 ( t ) = 1 + s(j — M (t)) and D*j ( t ) = D’- (t) + Oj(t) = 1 + fi for the total 
birth and death rates respectively. The following lemma gives upper and lower bounds on these 
birth and death rates. The lemma also gives a bound on the rate of type j mutations, which will 
correspond to immigration in our branching process. 

Lemma 7.2. There is a positive constant C 4 such that for sufficiently large N, if X’-{t) < s/2p 


and t € [rj,Tj + 1 A (), then the following hold: 

1 - a < Dj(t) < D*(t) = l + n, (7.9) 

1 — sq 3 — C 4 S < Bj(t) < B*(t) < 1 + sq 3 + C 4 S, (7-10) 

(1 - 6)se s ^- c ^ t - T B < fiXj-xit) < (1 + 5)se s ^ +c ^ t ~ T B. (7.11) 

Proof. Suppose X'-{t) < s/2/a and t e [r ? -,r 7+1 A (). By (12.31) . assumption A3, and the fact that 

j < J by Lemma 16.21 for sufficiently large N we have 

°‘ {t) S ( 2^) (1 + £ ( 2 ^) d + S P- 12 ) 

The result (17.91) follows immediately from equations (17.71) . (17.81) . and (j 7.12 f) . 


To bound the birth rate, note that since Gj{t) = s(j — M(t )) — /j, we have 

1 + /i + Gj(t ) — Oj[t ) = Bj(t) < Bj(t) = 1 + n + Gj(t ). 

Since s(gj — C 3 ) < G'j(t) < .‘>(q J + C 3 ) for sufficiently large iV by (16.151) and part 1 of Proposition 
14.31 the inequality (17.101) now follows from ()7.121) and (|2.3I) . 

Finally, if t € [ T j, Pj+i A £), then since Gj_i(t) = Gj{t) — s, part 1 of Proposition 14.31 gives 
■s(qj — C 3 — 1) < Gj-i(t) < s(qj + C 3 — 1). Now (17.111) follows from this observation and (14.71) . □ 

We will use the bounds in Lemma [7721 to obtain a coupling in which ( Xj(t),t > Tj) is bounded 
between two branching processes with immigration. More specifically, we will construct processes 
(. Xj~(t),t > 0 ) and (. Xf~(t),t > 0 ) such that 

lr(t)<T'(t + Tj )<l+(f) (7.13) 
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for t < Kj, where 

Kj = inf ju : Xf(u) > A {{r j+1 f\Q- tj). 

The processes (XJ(t),t > 0) and ( Xj(t),t > 0) evolve according to the following rules 
Xj~ (t) is the size at time t of a population for which, at time t: 

• New immigrants appear at rate ^(t) = (1 + S)se s ^ qj+C ^ t 

• Each individual gives birth to a new individual at rate = 1 + s(qj + CJ). 

• Each individual dies at rate vj = 1 — s. 

Likewise, for the process (. Xj(t),t >0), at time t: 

• New immigrants appear at rate = (1 — S)se s ^ qj ~ c ^ t 

• Each individual gives birth to a new individual at rate A J = 1 + s(qj — CJ). 

• Each individual dies at rate vj = 1 + /i. 

To establish that a coupling can be achieved so that (| 7.13 f) holds, we will give an explicit 
construction of the processes {XJ(t),t > 0) and (Xj(t),t >0). To do this, we will construct a 
population in which individuals are colored red, yellow, and blue. We will let X + (t) be the total 
number of individuals at time t, and we will let XJ (t) be the total number of red individuals at 
time t. For t < Kj, the number of individuals at time t that are red or yellow will equal Xjrj+t), 
which we will refer to as the number of individuals in the “original population”. We will number 
the individuals in our population by the order in which they were born. 

The construction will require the original population process (X(t),t > 0), as well as ad¬ 
ditional Poisson processes. For each i E N, we will have Poisson processes N^j and iV^jj to 
help construct births and deaths and an additional Poisson processes N m j to handle immigration. 
These will be Poisson processes on [0, 00 ) x [0, 00 ) with Lebesgue intensity, which will be indepen¬ 
dent of one another and of the original population process. We will also need a sequence (Pe,j)^Li 
of independent random variables which are uniformly distributed on (0,1) and are independent 
of (X(t),t > 0) and the above Poisson processes. 

We first construct our population up to time Kj. Observe, as we go through the construction, 
that the red population has immigration, birth, and death rates of ( j)J (t ), A J, and vj respectively, 
the total population has immigration, birth, and death rates of J~j{t), Aj~, and respectively, 
and the red and yellow individuals stay in one-to-one correspondence with the original population. 
This construction is well-defined because Lemma m ensures that the rates described below are 
positive and the probabilities indicated below are between zero and one. 

• If a type j mutation occurs in the original population at time Tj + t, then an immigrant 
appears at time t. This will be the £th change in the population for some positive integer 
i. We color this immigrant red if fii.j < <j>J(t—)/(/j,Xj-i(t—)), and otherwise we color it 
yellow. A blue immigrant appears at time t if the Poisson process N m j has a point (t, x) 
with x < (j)~j{t—) — nXj_i{t—). 


(7.14) 

First, 
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• If the ith individual at time t— is blue, then it gives birth to a blue individual at time t 

if the Poisson process has a point (t, x) with x < Xj~ and dies at time t if there is a 

point (t,x) in N^j with x < . 

• Suppose the ith individual at time t— is red. If the corresponding individual in the original 

population gives birth at time Tj + t as part of a pure birth event, then the ith individual 
gives birth at time t. This will be the f'th change in the population for some £, and the 
new individual born will be red if fyj < Xj/B'-[t —) and otherwise will be yellow. If the 
corresponding individual in the original population gives birth at time Tj + t as part of a 
birth and death event, then the ith individual gives birth to a yellow individual at time 
t. The ith individual also gives birth to a blue individual at time t if the Poisson process 
Nbjj has a point at (t,x) with x < — B*(t.—). 

If the corresponding individual in the original population dies at time Tj +t as part of a pure 
death event, then this will lead to the £th change in the population for some £, and the ith 
individual dies at time t if fyj < vj/D'-{t—) and otherwise turns blue. If the corresponding 
individual in the original population dies at time t as part of a birth and death event, then 
the ith individual turns blue at time t. The ith individual also turns yellow at time t if 
Ndjj has a point at (t,x) with x < u~ — 

• Suppose the ith individual at time t— is yellow. If the corresponding individual in the 
original population gives birth at time Tj +1 as part of either a pure birth or a birth and 
death event, then a new yellow individual is born at time t. The ith individual also gives 
birth to a blue individual at time t if the Poisson process N^ j has a point at (t, x) with 
x<X+-B*(t-). 

If the corresponding individual in the original population dies at time t as part of a pure 
death event, then this will be the £t\i change in the population for some £, and the ith 
individual dies at time t if fy < / D'-(t—) and otherwise turns blue. If the corresponding 

individual dies at time t as part of a birth and death event, then the ith individual turns 
blue at time t. 

At time Kj, the coupling with the original population is broken, and we make all yellow individuals 
blue. After time Kj. the process evolves as follows: 

• If Kj < t < £j, then a red immigrant appears at time t if there is a point ( t , x) of N m j with 
x < < f>j(t —) and a blue immigrant appears at time t if there is a point (t,x) of N m j with 

<x<(j)f(t). 

• If the ith individual is blue, it gives birth to a blue individual at time t if IVj, jj has a point 

(■ t , x) with x < \+ and dies at time t if there is a point (t, x) in N^ij with x < . 

• Suppose the ith individual is red. Then the ith individual gives birth to a red individual at 

time t if the Poisson process has a point (f, x) with x < XJ and to a blue individual 

at time t if A^y has a point (f, x) with A J < x < Xj. Also, the ith individual dies at time 
t if the Poisson process has a point (■ t , x) with x < and turns blue at time t if 

Ndjj has a point (t,x) with Vj~ < x < uj. 

For j G /, let T~Lj be the u-field generated by F Tj along with the Poisson processes 
Nd,i,hi and N m ^ and the random variables (3^h for h < j. Because the immigration, birth, and 
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death rates <£&, 6- Xf, X- isf and is- are all T-Lj- measurable, conditional on LL-, the processes 

J J J J J J J J 

(Xj~(t),t > 0) and (Xf(t). t > 0) are continuous-time branching processes with immigration, in 
which the immigration rate varies with time. 

Let 

r' = r j + — logf—Y 

SQj \ s Qj J 

Note that Tj < < rj for sufficiently large N. In view of (14.91) and part 3 of Proposition 14.31 

along with the fact that log(s/fv)/log(l/sk]y) —>• oo as N —» oo by (12.3p . we have rj < Tj + 1 
on {£ > Tj} if N is sufficiently large. Lemma 17.41 below helps to bound the probability that 
At j < Tj — Tj and therefore helps to ensure that with high probability, (I7.13|) holds up to time 
Tj — Tj. We will need the following bound on the mean of the branching process. 

Lemma 7.3. For sufficiently large N, on {tj < £}, we have 


mxM-T t )\r TI ]<f£r io S (X). 


Proof. Standard calculations involving supercritical branching processes give 


E[Xf{T'j-Tj)\F Tj \ = 


f+{u)e^-^'j- T s- u ) du 


rtj-Tj 


= (1 + 5)s 

Jo 

= (1 + S)se s ^ +C4+l) ^- T ^ [ 

Jo 


s(qj+C4,)u e s(qj+C4+l)(Tj-Tj-u) 


Zj-Tj 


du. 


Now s(C , 4 + 1 )( t ' — Tj) —>• 0 as N —>• oo by the reasoning in ()6.31l) . and e sq ' jT f = (sqj) 3 . Also, 


Jo 


I _ ?) 

~ SU du = - < 6 - Tj = 

s sqj 


-log (—) 
Qj V S( lj J 


+ 


sqj 


Since qj > (1 — 2 6)k]\r on {Tj < £} by part 3 of Proposition 14.31 the result follows. 


□ 


Lemma 7.4. We have 



= 0 . 


Proof. In view of Lemma 16.21 for j £ I we have rj < Tj+\ < ( on A. Therefore, for j £ I, on A 
the only way to have k j < t) — Tj would be to have XJ(t) > s/2/j, for some t < rj — Tj. Because 
(Xh(t),t > 0) is a submartingale, it follows from Doob’s Maximal Inequality and Lemma 17.31 
that 


P (An {Kj < Tj - Tj}\F t .)<p( sup X+ {t) > fJ- T t .\ 

V0 <t<T'--Tj J 

J J 


PIOt*} 


< fE{X+( r' - T,)p=- Tj ]l {c>rj , 


. C/i . 

< -log 


s A k% 


1 


skN 


( 7 . 15 ) 


Summing over j £ I, and then using (|2.3I) and the fact that the cardinality of / is at most 3 Tkj\f 
by Lemma 16.21 we obtain the result. □ 
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7.3 The probability that a family survives 

Here we use the branching process coupling introduced in the previous subsection to obtain upper 
and lower bounds on the probability that an individual will acquire a jth mutation before time 
and have descendants surviving a long time into the future. 

Lemma 7.5. Suppose j € I, where j is possibly random, and tj is a stopping time. Define dij 
as in subsection m On the event {tj < C}? we have for sufficiently large N, 

(1 ~ 2S)eb < P(X-(r' - Tj) > om,) < P(X+( r' - t,.) > 0{Hj) < ll + 25)e \ (7.16) 

Qj Qj 

Also, letting LJ and L+ denote the numbers of immigrants in (Xf (t), t > 0) and (X+(t),t > 0) 
respectively that have descendants alive at time t'j — Tj, for sufficiently large N on {rj < we 
have 


P(L+>2\Hj)< 


2e 


2b 


$ 


(7.17) 


Proof. Throughout the proof, we work on the event {tj < £}. Because X~{t) < Xj~(t) for all 
t > 0, the second inequality in (17.161) is obvious. We now prove the third inequality. By (17.11) . the 
probability that an immigrant in the branching process (Xj~(t),t > 0) at time u has descendants 


that survive until time rj — Tj is 


A + - nf 

3 J 


\+ _ ■ 


Now A+ — V+ = s(qj + C 4 + 1). Also, for sufficiently large N, 


T'-Zj = — log ( — — 


sqj \sqj 


> 


sqj 2sq. 


■log 


( — 

\sqj 


Therefore, if u < £,• — Tj, then 


v: e 'J < (1 - s ) e - s ^ +C4+l ^ T 3-^ < e - sq iW-^ < (s^) 3 / 2 , 




which, in view of part 3 of Proposition 14.31 and assumption A3, implies that for sufficiently large 
IV, 

s(qj + C 4 + 1) 


A+ - v* 


V - + - 


+ - V f)( T ' j -Tj-u) 


< 


1 + s(qj + C 4 ) - ( sqj ) 3 / 2 


Therefore, 


E[L+\Uj] = 


rij-Tj 


<Pt (») ■ 


A+ - 


tj- T J 


<-L 

= (1 + 8)s 2 {qj + C\ + 1) ^ 
qj + C 4 T 1 


A+ _ u + e~ {x J - v j)P'j~ T o- u ) 

3 3 


^ s(qj T C 4 T !)• 


du 


(1 + 8)se s ^ +C4)u • s( qj + C 4 + 1) du 

r pqj ■ ('■,){£, rj) _ i 


s(qj + C 4 ) 


<(1 + 5 ) 


Qj + C 4 
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Because e sg j^j T ^ = e b /(sqj) and C^s^j — Tj ) —> 0 as N — >• oo by (16.311) . it follows that for 
sufficiently large N, 

E[Llm < (1+ g 2l5)et . (7-18) 

The conditional Markov’s Inequality now gives the third inequality in ()7.16l) . Because the condi¬ 
tional distribution of L d" given Uj is Poisson, we have P(L J > 2\ Pf) < (E[L{f\Hj \) 2 . Therefore, 
(17.171) also follows from (17.181) . 

It remains to prove the first inequality in (17.161) . The argument is similar to that for the third 
inequality, but we will need a lower bound on the expectation. For sufficiently large N, 


ftj T j 

E{L]\Hj\ = J 

>-f 


X J - u 3 


\ — — —(A - —V. )(T f .—Tj—u) 

Xj — Vj e v 3 3 3 3 J 


du 


tj r i A — V ■ 

<t>j (u) ■ — -— du 


A 7 


F T \l-5)se^- c *>- S[qj ~ Ci) ~^ du 
Jo i + s (qj — C4) 

(1 — 5 )s 2 (qj — C4 — fi/s) f e sfe-C4)($j-Tj) _ 1 


> 


1 + s(qj — C4) 
(1 - (3/2)<5)e fe 
Qj 


s(qj - C t i) 


(7.19) 


Because the conditional distribution of L- given Hj is Poisson, we have 

P(X-(rj - Tj) > 0| Uj) = P{L~ > 0| Uj) = 1 - e~ E ^ n J > E[Lj\Hj\ - (. E[Lj\Hj ]) 2 . 

The Hrst inequality in (17.161) follows from this result and (17.19D . □ 


7.4 The size of a surviving family 

The lemma below bounds the probability that some individual will acquire a jth mutation before 
time and have at least xe sqj ( T j~ Tj ' > descendants alive at time rj. Recall from (I4.6j) and part 1 of 
Proposition 14.31 that e sqj ^ T j~ Tj ^ is approximately the number of type j individuals that we would 
expect there to be in the population in the absence of such an early type j mutation. This result 
is the precise version of (13.51) . which is the key to understanding why the Bolthausen-Sznitman 
coalescent describes the genealogy of the population. 

Lemma 7.6. Fix j £ I, and recall the definition ofPj from subsection 
N, on {rj < £}, we have for all x € [5/2, 2 /S\, 

P{X~(r'j - Tj) > xe^j-^lUj) > (7.20) 

qjX 

and for all x € [e~ b 7 2/(5], 

P(X+(t' - T j) > xe^'j-^lUj) < ( 7 . 21 ) 

J J J J qjX 


1.2. For sufficiently large 
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Proof. Throughout the proof, we work on the event {tj < £}. We first prove (17.211 ) . Suppose 
x € [e~ b ,2 /8\. If Xj~(Tj — Tj) > xe sqj ^~ Tj \ then either two immigrants in the population have 
descendants alive at time rj — Tj, an event whose probability has already been bounded above 
in (17.171) . or else for some u € (0, fj — Tj], an immigrant arrives at time u and has more than 
xe sqj (b~ r '') descendants at time rj — Tj. Note that 

10+ ~vf)~ sqj\(Tj - Tj) = s{C 4 + 1) ■ — log f—\ ->• 0 (7.22) 

sqj \ sqj j 

as N —>• oo by the reasoning in (16.311) . Therefore, for sufficiently large N, we have 

xe «b(.T}-Ti) > (i _ (7.23) 


Suppose an immigrant arrives at time u , and let X^ u (t) be the number of descendants of this 
immigrant in the population at time t. For t > 0, let 

W+{t) = eM-^X+Jt + u), (7.24) 

and let W + = lim^oo (t ), which exists by (|7.3j) . Equations (|7.23l) and (17.241) imply that for 
the immigrant to have more than xe sqj ^~ Tj ' descendants in the population at time rj — Tj, if N 
is sufficiently large we must have 

W+(rj — Tj — u) > (1 — 5)xe' X i -i/ -? (7.25) 

To estimate the probability that this occurs, observe that by Lemma 17.11 and (|7.2I) 


9A + e ~^ 

P{\W + - W+(rj - Tj - «)| > 6xe (X i )u ) < " J 


< 


S 2 x 2 e 2( X ~ v i )«(A+ - i/+) 

2(1 + s(cjj + C 4 ))e 

5 2 x 2 s(qj + C 4 + 1) 


3 "3 

-(At-i'+X'd-r,-) 


Since e 


-(A t-iz+KrJ-r,-) 


j) < e T i) — (sqj) 3 , it follows that for sufficiently large N, 


P(\W + - W+(rj - Tj - u)| > Sxe^t~ v t>) < ■ (7.26) 

Note that A+ — z/+ > sqj, and (1 — 5/2)sqj < (A+ — ^j")/A+ < (1 + 8) sqj for sufficiently large N. 
Therefore, by (17.21) and (17.41) . for sufficiently large N, 


P(W + > (1 - 2 S)xe iX t~ v t>) 


= + ^ ^ c -d-2 S)xe (x t v P\x+- v +yx+ 

< (1 + S)sq j e- ( ~ 1 - 3S)sq ^ eSqjU . 


(7.27) 


The probability of the event in (|7.25|) is bounded above by the sum of the expressions in ()7.26|) 
and (17.271) . Thus, combining this result with (17.171) . we have 


P(X+(t' - Tj) > xe sq iV~ T ^\Hj) 

+ C ^ (1 + 6>>seS{9i+C4)U ( (1 + S)sQje~ (1 ~ 3S)sqjXeSqjU + 


du. 


(7.28) 
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Using that e sqj ^ j T ^ 
sufficiently large N, 


e b /(sqj) and that s(£j — tj) —>• 0 as N —>• oo by (16.3111 . we have, for 


T \l + 5)se s ^ +C4)u ■ 

Jo 


3( - Sq ’ )2 in < 4e ‘ S 


5 2 x 2 


5 2 


x 


(7.29) 


Also, making the substitution y = (1 — 3 8)sqjxe sq i u , so that dy/du = sqjy, and using again that 
s(£j — Tj) -» 0 as N —>■ oo, for sufficiently large N we have 


^ (1 + 5)se s ^ +C * )u • (1 + 5)sq j e-( 1 - 35)sq i xeSqjU du 

Jo 


< (1 + 5) 2 S 2 qj e CiS ^- r i) ^ e sq j u e -(.l-3S)« U xe«i u du 

Jo 

r(l—3S)sqjXe“ 9 3^i— T i^ „—y 

= (1 + S) 2 s 2 qj e c ^-^ f 9 2 dy 

J(l—36)sqjX (1 - 3 5)s-q z j x 

1 + 6 5 


< 


7+ 


From (17.281) . (17.29H . and (17.301) . we get 

P(x; ( ,' - r 3 ) > + ^ + 


(7.30) 


(7.31) 


Recall that (1 — 2<5)fc/v < q 3 < (e + 2<5)fc/v on {tj < </} by part 3 of Proposition 14.31 Since fc/v —>• oo 
and skjsr —>• 0 as N —>• oo by assumptions A1 and A3 respectively, the upper bound (17.2111 follows 
from (17.3111 . 

Next, we will suppose x € [<5/2, 2/A] and show (17.2011 by similar arguments. We consider only 
the individuals colored red in the construction given above. Suppose a red immigrant arrives at 
time u. Then let Xj u (t) denote the number of red descendants of this immigrant at time t, and 
for t > 0 , let 

W-{t) = e~^-^ t XT u {t + u). 

Let W~ = lirn^oo W~(t). Because |(A“ — vj) — sqj | —> 0 as A^ —>• oo by the reasoning in (|7.22|) . 
the reasoning that led to f]7.25[> implies that if 


W"(rj - t 3 - u) > (1 + 5)xe {X i ~ v i )u (7.32) 

and N is large enough, then we must have A/(r] — Tj) > xe sqj ^ Tq ~ Tj \ Because s(r ? - — Tj) —>• 0 as 
N —>• oo by the reasoning in (16.3111 . we have 

e -(\ T -U )(+'—+) < (i + 6)(s qj ) 3 

for sufficiently large N. Therefore, by the reasoning leading to ()7.26f) . for sufficiently large N we 
have 

P(\W~- W~(t' -Tj-u )I > 8xe^- v J )u ) < 3 ^ 2 . (7.33) 
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Note that A ? — v- < sqj, and (1 — 5)sqj < (Xj — i/- )/A ■ < (1 + 5/2)sqj for sufficiently large IV. 
Therefore, by (17.41) 


P(W“ > (1 + 25)xe {X i ~ v i )u ) = 


^ c ~(l+2^)xe (A j )u (\ j -v j )/\. 


> _ ^ e -0-+^)^3 Xe - Bqi ' 1 


(7.34) 


By using (17.331) and (17.341) to bound from below the probability in (17.321) . we get that for suffi¬ 
ciently large N, 


P(X-(rj - Tj) > xe^-^lHj) 


> 


J*’ "’{I - 5)se s ^~ c ^ ((1 


— S)e~ ( ' 1+3S ' >sqjXeSqjU — 


3 (sqj)* 
S 2 x 2 


du. 


(7.35) 


Following the reasoning in (17.301) . this time using the substitution y = (1 + 3 5)sqjxe sq i u , we get 

T \i _ S)se a ^- C ^ • (1 - S)e-^ 1+3S)sq ^ eSqjU du 

Jo 

Y _ r(l+36)sqjXe S9 i^i- T ^ 


> 


> 


Qj x J (l+3(5)sqja: 
1-6 8 ( 

( 

q jX 
1-6 5 


e y dy 

—(l+3S)sqjX — (l+38)e b x 


) 


q jX 


(l — (1 + 35)sqjX — e e x ). 


(7.36) 


On {rj < C}, by part 3 of Proposition 14.31 we have sqjX < 2(e + 2 5)sk]y/5 —>• 0 as N —> oo. 
Also, using the definition of b from (14.11) . we have e~ e x < e“ 120007 "/( 5er ). Therefore, using (17.361) 
to bound the first term in (17.351) . and using the reasoning of (17.291) to bound the second term, 
we obtain (17.201) . □ 


In view of (I7.13p . Lemmas 17.51 and 17.61 show that the number of early type j individuals is 
well-approximated up to time rj by a continuous-time branching process. The result below tells 
us that the number of early type j individuals at time Tj + \ is usually determined, to within a 
small error, by the number of such individuals at time rj. 

Lemma 7.7. For j E I, define the event 

Aj = {|- e -^' +1 G ^ v) dv X'( Tj+1 )\ > e~ b }. (7.37) 


Then 


lim P 

TV—>• oo 



= o. 


Proof. Let S be the set of individuals at time rj descended from individuals that acquired their 
jth mutation during the time interval {t 3 . ffi. which means there are Aj(rj) individuals in the 
set S. Then, using the notation of Corollary 14.81 with rj in place of k, we get that for t > rj, 


e 


f .*Ar J+1 AC 

■M 

J 


Gj(v) dv 


Xj(t A r j+ \ A (J) 


Aj(rj) + Z/(t), 


(7.38) 
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where (Z? (t'„ +t),t> 0) is a mean zero martingale. Therefore, on {tj+i < £}, we have 


, , . —sqj(T , -—Tj)—fj' + 1 Gj(v)dv , , \ a 

e- s ^-^X , j (T , j ) = e 33 v X'(T J+1 )-e- sq ^-^Z^r j+1 ) 

= e f^GA-)-s qj )dv e -f T ]^ Gj (v)dv x ,^ + ^ e -^(r'-r J ) z 5 (r? . +i) _ ( 739 ) 

By (14.5p . on {Tj + \ < C}, we have e~^ dv X'-{rj + 1 ) < C}. Also, by part 1 of Proposition 

l4~3l on {rj + 1 < £}, we have 


f T j 

/ |Gj(u) - s^| du < C' 3 s(r' - Tj ), 

J Tn 


(7.40) 


which tends to zero as IV —>• 00 by the argument in (16.3111 . Thus, (I7.39[) implies that for sufficiently 
large N, on {tj + 1 < <}}, we have 

\c ^ - e~% +1 Gj(v) dv X'{T j+1 )\ < ^ + e- sq ^- T ^\Zf{T j+1 )\. (7.41) 


It remains to bound \Z^ (tj + i)|. By Corollary 14.81 and the argument leading to (I4.20p . 


Var (z/(rj + t)|< 3 E 


f 


(rj+i)Ar, +1 AC _ 2 /“ GjO) <*> , 


Xj ( u ) du 




(7.42) 


Because Gj{y) > s(qj — C 3 ) for u € [ T i, Tj+i A C) by part 1 of Proposition 14.31 it follows from 

7 *M 


equations (17.38(1 and (17.421) . Pubini’s Theorem, and the fact that (Zf(r!j + t),t > 0) is a mean 


zero martingale that for sufficiently large N, 


Var(Zf(r' + t)\^) < 3 E 


(rj+t)Ar J+ iAC - Gj (u) dv , 


< 3 E 


(t'+OATj+iAC 


< 3X'(r' 


/ 

•'t' 

f 

rex 


e 3 


(X (r ) + Z?(t))d« 




e - S fe-C 3 )( U -r')( X /( /) + ^ 




3 —fc-C 3 )(«-^) ^ 


< 


4X'(rj) 




Therefore, by the L 2 Maximum Inequality for martingales, 


o~b 


P( |z/(r i+1 )| > 




( T j ) „~2.sg, (p -Tj ) _ r v'ln-' 


= CX , j (T')(sq j fe 2b . (7.43) 


sqje 


-2b 


On {kj > rj — Tj}, we have X'(rj) < (rj — Tj) by (17.131) . Let J 7 *, be the cr-field generated by 
Jy/ and the event («j > rj — Tj}. Since the additional Poisson processes N^j, and N m j 

and random variables fyj are independent of the population process (X(f),f > 0), we have on 

{Hj > Tj - Tj }, 

p(jZf(Tj +1 )| > < CXj~(Tj — Tj)(sk]\r) 5 e 2b . 
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Therefore, taking conditional expectations of both sides of (|7.43l) with respect to J~ T . and then 
using Lemma 17.31 and part 3 of Proposition 14.31 we get 


> r' - Tj} n ||Z/(r i+1 )| > F r )j < CE[X+{r'j - Tj )\F Tj }(sk N fe 

C{sk N ) 2 e 2b 


5„2 b 


kN 


log 


sk]\r 


(7.44) 


Using Boole’s Inequality and summing over j G I, we now deduce from equations (17.41 [1 and 
(17.44[> and Lemmas 16.21 and 17.41 that 


p(An U/j ) £ 3» N .U^,„ g (J-), 


which tends to zero as N —>• oo by assumption A3. 


□ 


7.5 The fraction of individuals descended from an early mutation 

To determine the genealogy of the population, it will be important to consider the fraction of type 
j individuals in the population descended from an early type j mutation, as this is an estimate 
of the fraction of lineages that will coalesce near the time of this mutation. To this end, we let 


Y; = 


X'jiTj+l) 

\s/rt ' 


(7.45) 


which is the fraction of type j individuals at time Tj + \ that are descended from a type j mutation 
that occurred between times Tj and £j. Also, define 


Y r = 


_ Tj ) _ e -6) v 0 


and 


((e sq3<yT j Xj (Tj — Tj) — e b ) V 0) + 1 + 4 6 
j e~ sq i (r i-^) Xf M — Tj) + e~ b + 1 — 4(5' 


(7.46) 


vj ‘3) 

Lemma 7.8. Suppose j G I. For sufficiently large N, on {tj < £}, we have, for all y G [5, 1 — 5], 


(1 y){ 1 m l<P{Y->y\n j )<p{Y+>y\n j )<^- y)(1 + 1M) 


QjV 

Also, on n {Tj- |_i < £} H {Kj > Tj — Tj}, we have 

y- < Yj < y+ 

3 — 3—3 


QjV 


(7.47) 


(7.48) 


Proof. We first prove (|7.47[) . Suppose y G [5,1 — 5]. The middle inequality in (|7.47l) is immediate. 
To prove the third inequality in (I7.47p . note that YF > y if and only if 


e-vlWx+tf - Tj) > a + *- b -iS)y-e-\ 


(7.49) 
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Since e b /y < 5 by (14.11) . we see that (17.491) implies e sq ^ T i — Tj) > (1 — 55)y/(l — y ). 

Thus, by Lemma 17.61 for sufficiently large N, on the event {Tj < £}, we have for all y E [5,1 — 5], 


P{Y+ > y\Hj) < p(e- s ^-^X+(r' - Tj ) > (1 ™ )y 

v y 


u 


< (1 + 7<5) (1 -y) 
(1 - 5 S)qjy 


which leads to the third inequality in (17.471) . Likewise, note that Y- > y if and only if 


e 


sqj(T' 



Tj) > 


(1 — e b + 4 S)y + e b 
1 - 2 / 


which, since e b /y < 5, will always hold if e sqj Pj T:i>x ^ (rj — Tj) > (1 + 55)y/(1 — y). Therefore, 
by Lemma 17.61 



(1 + 5 5)y 
1 - 2 / 



> (1 ~ 7fl)(l -y) 
(1 + 5 5)qjy 


which implies the first inequality in (17.471) . It remains to prove (|7.48l) . 

The last statement of part 1 of Proposition 14.11 combined with (14.91) , implies that on the 
event {rj+i < £}, no individual that gets a jth mutation at or before time Tj has a descendant 
alive at time Tj + In particular, we have X'-(Tjj r \) = Xj^(T ] j r \). Therefore, using also that 
x j,i( T j+i) + x j,2(Tj+i) = Xj{r j+ 1 ) = we get, on {T j+1 < C}, 


Yj = 


X j,l(T j+1 ) 


e -/ T / G^)dv x . i{r . +i) 


-. (7.50) 


XjArj+i) + Xj, 2 (T j+2 ) e - ^ G i( v) *>Xj, 1 (T j+1 ) + e~ ti +1 G » ^ Xj , 2 (r j+2 )' 

By (jMD, on {r j+ i < C}, 

1-45 < e“ Gj{v) dv X j)2 {T j+ 1 ) <1 + 45. (7.51) 


Combining (17.501) . (|7.51[) . and the definition of Aj, we get that on A c - D {tj + i < C}, 


e~ sq M- T PX'^) - e~ b 


e ~ sq j ( T 'j~ t 3 ) x'- (rj) - e~ b + 1 + 45 


<Yj< 


e-^-^X'iT') + e~ b 


e-^-^X'ir') + e - 6 + 1 - 45' 


Combining this observation with (17.131) and noting that Y) > 0, we conclude that (17.481) holds on 

A j n { r i+i < C} n > T j ~ T j}- □ 


8 Coupling with the Bolthausen-Sznitman coalescent 

In this section, we prove Theorem 12.II bv establishing a coupling between the coalescent process 
(IIjv(u),0 < u < t + 1) and the Bolthausen-Sznitman coalescent. Our strategy will involve 
examining the process at the times Tj. A very similar idea was used in im by Desai, Walczak, 
and Fisher. 
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8.1 No coalescence between times tl and ajyT 

Recall from Remark 15.51 that with probability tending to one as N —>• oo, no lineages coalesce 
as they are traced back from time a^T to time tl+ io- The result below shows that the lineages 
are also unlikely to coalesce as they are traced back further from time tl +io to time tl, which 
implies the statement (12.51) from Theorem 12.11 As with Lemmas 15.41 and 16.51 it is sufficient to 
state the result for the first two lineages. 

Lemma 8.1. We have 

liiri sup P(A n {T \,2 > tl}) < CTe~ b . (8-1) 

N—>oo 

In particular, the statement \2.5\) holds. 

Proof. Let l\ = U\{clnT) and I 2 = ^(cjatT). Without loss of generality, suppose l\ < £ 2 - We 
know from the argument in Remark 15.51 that 

lim P(A n {Ti , 2 > t l+ 10 }) = 0 , 

N—too 

so we only need to follow these two lineages between times tl and tl+ io- By Lemmas 15.21 and 
16.41 we know that, outside of an event A such that limjv_>.oo P (A n A) = 0, for i E {1,2} we have 
Ui(Tj- |_i) = j for j E {L— 1, L,... , ii} and Ui{rj + \) = £i for j E {U ,..., L + 9}. When this occurs, 
there are only three ways that these lineages could coalesce between times tl and tl+io, in view 
of the fact that only lineages of the same type can coalesce: 

1. We have i\ = i 2 and T\ 2 > Te 1+ i- 

2. We have £\ < £2 and t ^ 1+ 1 < T\ 2 < ^ 2 , 4+1 < r h+ 2 - That is, as we trace back the ancestral 
lines, the second lineage gets traced back to a type I\ individual, then coalesces with the 
first lineage between times ti 1+ \ and T( 1+ 2 - 

3. For some j E {L — 1,L,... ,£ 1 }, two type j lineages at time Tj + 1 are descended from the 
same type j — 1 lineage at time Tj. 

Lemma 15.41 bounds the probability of the first possibility above, while Lemma 16.61 bounds 
the probability of the second possibility. It remains only to consider the third possibility, in 
which the lineages coalesce between times Tj and Tj + 1 for j E {L — 1,L,... ,£i}. As noted in 
the discussion in subsection 16.31 Lemmas 16.51 and 16.61 establish that the probability that such 
a coalescence event occurs without the ancestor acquiring an early type j mutation is bounded 
above by CTe~ b . Also, because the result of Lemma 17.51 holds even when j is random provided 
that Tj is a stopping time, we have 

Ce b 

P {A PI {X + (r' — Tj) > 0 for some j € {L — 1, L,... L + 9}) < -—, 

J J kn 

where we have used also part 3 of Proposition 14.31 In view of (17.131) and Lemma 17.41 it follows 
that the probability that, for some j E {L — 1, L,... ,£i}, two type j lineages at time Tj +1 are 
descended from an early type j mutation tends to zero as N —>• 00 . The result (18.11) now follows 
from the bounds collected in this paragraph. 

Finally, since tl < oat(T — 1) on A by (15.41) and (14.91) . the statement (12.51) follows from (18.11) . 
mu), and the fact that e > 0 and 5 > 0 are arbitrary. □ 
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8.2 Representing the early type j mutations by a point process 

Fix j £ /. Recall from the discussion before Lemma 16.51 that the individuals sampled at time 
ci^T are typically descended from type j individuals at time Tj + 1 , and these lineages will typically 
coalesce only if they are traced back to one individual that acquires its jth mutation before time 

We construct in this subsection a point process that encodes these coalescence events. 

Let A j be the event that A occurs and that Ui(rj) = j + 1 for all i £ {1,... ,n}. Suppose 
we condition on the event A j, the random variables Yp = A^(r^ + i) /\s/ /L| and Tg for £ & I, and 
the partitions Ibv(T — for £ £ I with £ > j + 1. Denote the blocks of II n(T — ti/cln) by 

Bu i, ■ ■ ■ i Bg n , where we rank the blocks in order by their smallest element. By the definition 
of A j, the rij +1 individuals in the population at time t 3 +\ that are ancestors of individuals in 
the sample are all among the \s/y~\ type j individuals in the population at time Tj + However, 
by the symmetry in the process, all [~s///| (|"s///| — 1 )... (fs//u~| — rij + 1 + 1) possible choices of 
rij +1 individuals out of these \s/fi\ are equally likely to be the ancestors of the individuals in 
the sample corresponding to the integers in the blocks Bj + i.i,..., Bj + \, nj+1 respectively. Also, 
Xj( T j + i) of the \s/n] type j individuals at time tj + \ are descended from an individual that got 
an early type j mutation between times Tj and £j. We call these type j individuals good. 

We now construct some uniformly distributed random variables Zij for i € {1,... ,n} and 
j € I. Begin by defining random variables Z* ■ for i € {1,... ,n} and j £ / which are uniformly 
distributed on [0,1] and independent of the population process (X(f),f > 0) and of one another. 
If j > L + 1, then let Z h ^ = Z* ■. Likewise, if either A j does not occur or Uj+i < i < n, then 
let Z^ = ZY. Now suppose A j occurs. For i € {1,..., n^+i}, we call the ( i,j) ancestor the 
individual at time Tj + \ that is the ancestor of the individuals in the sample whose label is in 
the block Bj + \ l . Let Ko = 0, and for i £ {1,... ,rij + 1 — 1}, let K t be the number of integers 
h £ {1,... ,i} such that the (h, j) ancestor is good. Then, conditioning on iL;_i in addition to 
the event A j, the random variables and T£ for £ £ /, and the partitions n^v(T — Ti/ajsr) for 
£ € I with £ > j + 1, the probability that the ( i,j) ancestor is good is 

x 'j{j j+ i) - LQ-r 

\s/lA - (< - 1) - 

Let Zij = Z*jPij if the ( i,j ) ancestor is good, and let Zj j = P i3 + Z*-( 1 — Pij ) otherwise. 
Note that Zij has a uniform distribution on [0,1], and the (i,j) ancestor is good if and only if 
Zij < Pij. Also, the random variables Z t J are jointly independent of the random variables Yi 
and the stopping times Tg for £ £ I. 

Let <bjv be the point process on [0,t + 1] X [0, l] n+1 consisting of all of the points 

(T -— ,Yj,Zij, ... , Z, 

V a N 

such that j £ I, j < L, and Yj > 0. We use the point process to construct a coalescent 
process (n^(u),0 < u < t + 1) as follows. Let n^(0) = {{1},..., {n}}. For u € (0 ,t + 1], 
suppose (u, y,z\,..., z n ) is a point of <3 ?at and U.* N (u—) = ir, where it is a partition of {1 ,..., n} 
whose blocks, ordered by their smallest elements, are B\,... ,Bg. Then n^(u) is obtained from 
n^-(u—) by merging together all of the blocks B, for which Zi < y. The result below relates the 
coalescent processes (nyCu), 0 < t < t + 1) and (Il* N (u), 0 < u <t + 1). 
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Lemma 8.2. We have 


H p (n { n "( T -%) -■"•»(r■- £)}) > 1 - <**. (8.2) 

Proof. We claim that the event in (18.2p could fail to hold in the following ways: 

1. Either U N (T - r L /a N ) / {{1},... , {n}} or II^(T - r L /a N ) / {{1},... , {n}}. 

2. The event A j could fail to hold for some j E I with j < L. 

3. For some j E I, either the event A'- defined in the statement of Lemma 16.51 or the event A* 
defined in the statement of Lemma 16.61 occurs. 

4. For some j E I, two or more individuals at time Tj have descendants that got a jth mutation 
before time £,■ and then have type j descendants in the population at time t, j+ \. 

5. For some j € I with j < L and Yj > 0, and some i E {1,..., n}, the random variable Z t j 
is between P t j and Yj. 

To see that these are the only possibilities, recall from the discussion at the beginning of subsection 
16.31 that if A £ occurs for all t E I with l < L, then unless A'j or A* occurs, the only way that 
lineages can coalesce between times Tj and Tj + \ is for two or more lineages at time Tj+\ to be 
traced back to one individual that acquires its jth mutation before time fj. Unless the fourth 
event listed above occurs, the only way this can happen is for a group of lineages at time Tj + \ to 
get traced back to the same individual that acquires its jth mutation before time fj. In this case, 
suppose A N (T — Tj+ i/a N ) = II* N (T — Tj + i/ajy) = Tj+i, and Bj + i } i ,..., -Bj+i, n ,- +1 are the blocks of 
7 Tj-i_i, ranked in order by their smallest elements. By the construction described at the beginning 
of this subsection, we obtain Hn(T — Tj/ajv) by merging the blocks Bj+i.i for which Zjj < P t .j. 
We obtain IT^(T — Tj/ajy) by merging the blocks B ]+ \j for which Zij < Yij. Therefore, we can 
only have IIjv(T — Tj/ajy) ^ n^r(T — Tj/ajv) if the fifth event listed above occurs. 

We thus need to bound the probabilities of the five events listed above. Recall that P( A c ) < 2e 

bv (14.111). Bv construction. (T — Tj /a, m .Y,.Z\ j . Z„ j) will onlv be a ooint of 4> v if j < L, 

and tl < ajy(T — 1) on A by (15.41) and (14.91) . It follows that n^(T — tl/on) = {{1},... , {n}} 
on A. Also, by Lemma l8Tl the probability that A occurs and IUv(T — tl/on) ^ {{1}, • • ■ , {ri}} 
is at most Cn 2 Te~ b < Cn 2 e in view of (14.11) . By Lemma 16.41 the probability that A occurs and 
the second event above occurs tends to zero as A^ —>• oo. Lemmas 16.51 and 16.61 show that the 
probability that A occurs and the third event above occurs is at most Cn 2 Te~ b < Cn 2 e. The 
probability that A occurs and the fourth event above occurs tends to zero as A^ —>• oo by (17.171) 
along with (17.131) . Lemma 17.41 and part 3 of Proposition 14.31 

It remains to bound the probability of the fifth event above. For sufficiently large N, 

(i - l)Xj(rj + i) — Ki_i\s/n] ^ n\s/ii\ ^ 2rqu 

r*/Mi(r*/Mi-(i-i)) - \s/ia](\s/»-]-(i-i)) - — 

Because Z t j has a uniform distribution on [0,1] and is independent of Yj, the probability that 
Z tJ is between P l3 and Yj is at most 2 n/a/s. Therefore, using Lemma 16.21 the probability that 
this occurs for some £ E {1,... ,n} and j E / is at most 6n 2 TkNp/s, which tends to zero as 
N —>• oo by (12.31) and assumption A2. The lemma follows. □ 
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8.3 A Poisson point process derived from 

In this subsection, we modify the point process to obtain a Poisson point process <F from which 
we can construct a Bolthausen-Sznitman coalescent via the technique outlined in subsection l3.ll 
The random variables Zj _\...., Zj jU are already independent and uniformly distributed on [0,1], 
and they will remain unchanged. However, we will define new random variables Y* that are 
coupled with the original random variables Yj as well as new times T*. 

For j E I, let Zj be a random variable having the uniform distribution on [0,1] that is 
independent of the population process. Recall the definition of the u-field Hj from subsection 
m Define the random function 


Hj(y, z ) = P(Y+ < y\Hj) + zP(Y+ = y\Hj), for all y, z E [0,1], 

Also, let Fj{y) = P{Y+ < y\Hj) = Hj{y , 1), and for x E [0,1], let F~ l (x) = sup {y : Fj(y) < x}. 
Then it is easy to see that almost surely 


Y+ = F-' ( H :) ( Y +,Z/)). 

Note that if 0 < x < 1, then there is a random integer K(x) such that 


Pi y+ < BY 

1 - \S/A 




n 


Then 


P(H j (Y^Z J )<x\n j ) = P[Y+< 


K{x) 


Ui )+P Y+ = 


K(x) + 1 


n 


\s/rt ' V V i \s/y\ \ 

x-P(Y+< K(x)/\s/y]\n j ) 
j ~ P(Y+ = (K{x) + l)/\shA\U,) 


(8.3) 


= x. 


Therefore, the conditional distribution of Hj(YZ , Zj) given T~ij is uniform on [0,1]. For x > 0, 


Kj(x ) = < 

' e -(T* +1 -T*)(l- X )/a N x if £ < x < 1 

e -(r* +1 -T*)(l-e)/a N e if Q < x < £ 

k 0 if x < 0 


For x € [0, 1], let KJ x (x) = sup {y 

: Kj(y ) < x}. Also, let 



Yj = K~ 1 (Hj(Y J Z ,Zj)). 

(8.4) 

Then for all x > 0, we have 

P(Y* < x\Uj) = Kj(x). 

(8.5) 


Note that Y* never takes a value between 0 and e, so if Y* > 0, then Y* > e. 

We now continue with the construction of <F. For all j E /, independently of the population 
process (X(t),f > 0) and of all other auxiliary random variables introduced up to this point, let 
T* be uniformly distributed on [T — r * +1 /a/v, T — t * /ajv], and let d>' be a Poisson point process 
on [T — t* +1 /o,n,T — t*/ ajy] x [0, l ] n+1 with intensity 

du x x~ 2 dx x dz\ x • • • x dz n . 
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For all j such that T* E [l,f + 1] and Y* > 0, the point process will include the point 
(T? ,Y? , Zj t \,..., Zj >n ). Also, for all j such that Y* > 0, the point process $ will include all 
points of whose first coordinate is in [l,t + 1 ] and whose second coordinate is in the interval 
(s,Y*). Finally, will include all points of whose first coordinate is in [1 ,t + 1] and whose 
second coordinate is less than e. 


Lemma 8.3. The point process defined above is a Poisson point process on [1, t+ 1] x [0, l ] n+1 
with intensity 

du x x~ 2 dx x dzi x • • • x dz n . ( 8 . 6 ) 


Proof. We separately consider, for each j, the restriction of $ to points whose first coordinate is 
in the interval [T — t* +1 /on,T — r*/ajv]- Note that for a Poisson point process with intensity 
(18.61) . the expected number of points in the region [T — t* +1 /on.T — r*/ajv] x [x, 1] x [0, l] n is 


'j +1 


— r 




V 2 dy = 


Cr * +1 -t*)( l-x) 


CLjyX 


Therefore, from (18.51) . we see that if x > s, then P(Y* > x\TLj) is the probability that there are 
no points in this region. Using also that T* is uniformly distributed on [T — T* + 1 /ajv ,T — r* /on] 
and that the random variables Z. } \...., Z^ n are uniformly distributed on [0, l] n , it follows that 


(r;, y* , Zj, r,..., z j>n ) 

has the same distribution as the point whose second coordinate is the largest among points of 
a Poisson process with intensity ([8.61) restricted to [T — T* + 1 /ajv,T — r*/ajv] x [e, 1] x [0, l] n . 
Furthermore, conditional on the event that such a Poisson process has a point whose second 
coordinate is y and no point whose second coordinate is larger than y, the distribution of the 
restriction of the Poisson process to [T — r* +1 /ajy ,T — r? / ajy] x [s,y) x [0, l] n is that of a Poisson 
process with intensity ([8.6|) . It thus follows from the construction of $ that the restriction of T 
to [T — T* +1 /ajv, T — t*/ ax] has intensity given by (18.61) . 

Finally, because of the conditioning on TLj in (18.51 ). the random variables Y* for j E I are 
independent. Because the Poisson processes T) are independent, it follows that the restrictions 
of to the intervals [T — r* + 1 /ajv,T — t* /a.v] are independent. The lemma now follows from 
the superposition theorem for Poisson processes. □ 

We now use the Poisson point process to construct a coalescent process (II(u), 0 < u < t+ 1). 
Let II(u) = {{1},... , {re}} for u E [0,1]. For u € (1, t + 1], suppose (re, y, zi ,..., z n ) is a point 
of and II(re—) = 7r, where vr is a partition of { 1 ,..., n} into the blocks Be, ordered by 

their smallest element. Then n(re) is obtained from II(re—) by merging together all of the blocks 
Bi for which Zi < y. As discussed in subsection 13.11 this construction is well-defined, and the 
process (11(1 + re), 0 < re < t ) obeys the law of the Bolthausen-Sznitman coalescent. 


8.4 Comparing Yj and Y* 

The goal in this subsection is to prove two lemmas that establish that, with high probability, the 
random variables Yj and Y* are close. Lemma 18.61 bounds the probability that either Y) or Y* 
is greater than e, but the other is not. Lemma 18.71 bounds the probability that the difference 
between Yj and Y* is more than e 2 . We will need a couple of preliminary estimates. 
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Lemma 8.4. For j E I, let 


A" - 
A j ~ 


Qj ( Tj 

k N q \a N 


>sy 


Then 


lim P An I A" ) = 0. 

N^oo \ W 3 

v jei 7 

Proof. Lemma 16.21 and part 1 of Proposition 14.11 imply that on A, the fittest individual in the 
population at time Tj must have either j or j — 1 mutations. It therefore follows from (14.21) and 
(|4.12p . along with the fact that tj > a at + 2ajv/&jv for all j € I by Lemma 16.21 that Q{jj) must 
either equal qj or qj — 1 on A for all j € /. Let S = [1 + (T — (t + 2))/2, T], which is a compact 
subset of (l,oo). It follows from Proposition 14.41 that 


sup 

teS 


Q{a N t) 


k N 


-?(*) 


0 , 


denotes convergence in probability as N —» oo. By ()6.16D and Lemma 16.21 on A we 


where 

have Tj/ai\r E S for all j € I. Therefore, 


sup 

je/ 


f-A 

kN 


\qn 


1 a —tp 0 , 


which implies the lemma. 


□ 


Lemma 8.5. There is a positive constant C such that if e < y < 1 and j E I, then on the event 
{rj < £} D ( A") c E Tij, we have for sufficiently large N, 

(1-,)(1-0«2 S ^^a-rtd + CST) 


QjV 


QjV 


Proof. By (16.11) . 


Also, by (14.161) and (|6.4j) . we have on {rj < £} D ( A '') c , 


T U i- r ; 1 


1 1 

1 

1 

kN 

qn qj 


k N q{T*/a N ) qj 

kN 

q{r*/a N ) 

Qj 


Qj ( T j 
k N q \a N 


<5 + 


Pl)-J 

a-N 


' i 




ON 


<5 + 10 eST. 


(8.7) 


( 8 . 8 ) 


Therefore, using (18.71) and () 8 . 8 |) along with the facts that q(Tj /on) > 1 by Proposition 14.51 and 
that qj/kN >1 — 2 6 on {rj < £} by part 3 of Proposition 14.31 we get that on {rj < D (A”) c , 


T j +1 T i 


ON 


1 

Qj 


< 


C 5 T 

kN 


Because |(1 — e x ) — x\ < x 2 /2 for x > 0 and (16.11) holds, it follows that when e < y < 1, we have 
for sufficiently large N. on {rj < C} H ( A"-) c , 


(1 -Kj{y))~ 


(! -y) 


QjV 


< 


1 ((t* + 1 - r*)(l - y) 


CLNV 


+ 


i -y 

T j +1 “ T j 1 

^ l ~y 

CAT 

y 

on qj 

^ y 

kN 


Because qj < (e + 25)kN on {rj < by part 3 of Proposition 14.31 the result follows. 


□ 
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Lemma 8.6. Letting A denote the symmetric difference between two events, for sufficiently large 
N we have 

CbT 2 


p(AnU({^>4A{!7>£})) < 


jei 

Proof. By Lemmas 17.81 and 18.51 and part 3 of Proposition 14.31 

|P(^ + > e\Hj) ~ P{Y* > £\Hj)\ < 


CbT 
sk jv 


for sufficiently large N on { t 3 < £} fl ( A"-) c , and the same result holds with Y- in place of YT. 
Because Yf~ < YA, and the random variables pA and Y* are monotone functions of the same 
uniformly distributed random variable by (18.31) and (18.41) . it follows that 


P({Y+ > e}A{Y* > e}\Hj) < 


CbT 

ek]\r 


on {rj < C} H ( A") c , and the same result holds with Y- in place of YA. Let 3 = A c - D {t 3 +i < 
C} fl {Kj > t'j — Tj}. By (j7.48j) . we have 

({Y- >e}n C ({Yj > e} n Vj) C ({Y+ > e} n *,■)• 

It follows that on {rj < C} H (A") c , we have 

CbT 


P(({Yj>£}A{Y*> e})n* 3 \U 3 ) < 


ek tv 


The result follows by taking expectations, summing over j E I, and using Lemmas 17.41 17.71 and 
18.41 along with the fact that the cardinality of I is at most 3 Tk^ by Lemma 16.21 □ 

Lemma 8.7. There is a positive constant C*, not depending on e, b, or T, such that for suffi¬ 
ciently large N, we have 

pfAny (d^ - Y* I > c*e 2 } n {Yj >£}n {Y* > e})) < C5Tlo &0-/ £ ) ' 

' ie/ ' e 

Proof. We first compare Y* to Y^ + . In view of ((8.31) and (18.41) . we need to compare the functions 
Fy 1 and Kj l . Suppose 2 E (0,1). If Fff l {\ — z) E [A, 1 — 5], then (17.471) implies that on {tj < £}, 
we have 

1-135 „ i . 1 +135 


qjZ + 1 — 135 3 


QjZ + 1 + 135 


Likewise, Lemma 1831 implies that if K- 1 (1 — z) > e, then on {t 3 < £} D ( A”) c , we have 

1 — (75 i , , l + <75 

< A " (1 -z)< 


q jZ + 1-Cb ~ 3 


qjZ + 1 + (75 


It follows that on the event {tj < £} n ( A ") c , if F- X (1 — z) E [5, 1 — 5] and K- X (1 — z) E [e, 1], 
then 

\Fr 1 (l-z)-Kr\l- z )\<CS. (8.9) 
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Because F- 1 and K- 1 are increasing functions taking their values in [0,1], and 5 < e by (15.11) . 
we see that (18.91) holds on {t 3 < £} D ( A”) c as long as F- X (1 — z) € [e, 1] and K- X (1 — z) € [e, 1]. 
Since 5 < e 2 by m, it follows that there is a positive constant C* such that on {tj < £}n ( A ”) c , 
we have 

\Y? - Y*\l {Y *>e}t {Y +> E} < {C* - 1 ) £ 2 . ( 8 . 10 ) 

It remains to control the difference between Y+ and Yj. By ()7.47l) . on {rj < £}, 

E h +1 0' + >«) - Y- V-> e) |W 3 ] = f (P(Y+ 1 {1 , + 2e) > y\Hj) - P(Y- l {y ->„ } > y\Hj)) dy 

3 3 J Q 3 3 

= J* {P(Y+ > e\U 3 ) - P(Y~ > e\Hj)) dy 

+ [ l ( P(Y+ > y\Uj) - P(Yf > yl-Hj)) dy 


^ (1 -e)C6 

< £ • ----h 

qj£ 


/* 1 —s 

J £ 


c ^-y) dy+s { ‘ 1+CT ) 




Qj( 1 - S) 


< 


C5 log(l/e) 
Qj 


Let '■I’ j = Aj 11 {Tj- |_i < C} H { Kj > Tj — Tj}. Because Yj < Yj < Y^ on 'I'j, by (17.481) . 

£ [(h +1 (V>«> - s C ' 51og<1/s) 

Now Markov’s Inequality implies that 


Qj 


^ + l{y. + > £ }l >£ — 


C6\og(l/e) 

qjs 2 


Combining this result with (18.101) and part 3 of Lemma 14.31 gives, for sufficiently large N, 
P({\Yj -Y* I > c*e 2 } n {Yj > £} n {Y* > e} n Vj n (A") c n A) < C( ^ og ^ . 
The result follows by summing over j and using Lemmas 17.4117.71 and 18.41 
8.5 Small coalescence events 


□ 


Lemma 18.81 below shows that it is unlikely that lineages will coalesce between times Tj and Tj+i 
ifl}< £ . 

Lemma 8.8. For sufficiently large N, we have 


T n y(w-£)M r 


T j+1 
ON 


n {Yj < e} < CTn 2 e. 
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Proof. Suppose j E /. Let Tj = fl {r J+ i < C} H {«j > rj — Tj}, where Aj is the event defined 
in Lemma r 77i and Kj is defined in (17.141) . Define the cr-field 77j as in subsection 17.21 Let Q 3 be 
the cr-field generated by the cr-field T~L j, the random variable Yj defined in (17.451) . and the event 
Tj. Conditional on Q 3 . the probability that at least two of the random variables Z ij, ..., Z n 3 
are less than or equal to Yj is at most (^)Y 2 . Therefore, on {tj < £}, we have 


p < n 


l n 


T — 
ON 


) * n K T ~lS i )} nW - E}n * J l e ' i ) - 


Now take conditional expectations of both sides with respect to 'Hj to get that on {tj < £}, 


P 


n* N 


T - 


ON 


/n * n it- 


t j +i 

CLN 


n {Yj < e} n Tj 


«i) < 


E [ Y ? 1 {Yj<E} 1 *j\ n i\- 

( 8 . 11 ) 


Recall that for any nonnegative random variable X , we have .E[A 2 ] = 2xP(X > x) dx. 
Therefore, on {r 3 < £}, 

/*oo 

E \ Y j 1 {Yj<e} t '*j\' H j]= / 2xP(Yj\ { y > x\Hj) dx 

Jo 

< / 2xP(Yjl^ j > x\Hj) dx. (8-12) 

Jo 

Recall from (17.481) that Yj < Y^ on Tj. Also, from (17.131) . we see that on Tj, if Yj > 0 then 
Xj~(r'j — Tj) > 0, and on {rj < £}, we have qj > (1 — 25)k^ by part 3 of Proposition 14.31 
Therefore, by Lemma 17.51 

Ce b 

P(Yjly. > 0| Hj) < r—. (8.13) 

Kn 

Also, on Tj, if Yj > x and 3e -b < x < e, it follows from (|7.46l) that if e is sufficiently small, then 


e -‘«V-^X+M - T.) > (e ~ + 1 - 4 ^- e 

J J 1 — X 


-b 


X 

> -. 

~ 2 


Therefore, Lemma 17.61 implies that if £ is sufficiently small and N is sufficiently large, and if 
3e _& < x < s, then 


P(Yjtq lj > x\Uj) < 


C 

k]yX 


(8.14) 


Dividing the integral on the right-hand side of (18.121) into two pieces and using (18.131) to estimate 
the first piece and (18.141) to estimate the second piece, we get 


E 


r 3e b (j e b re 

[Y^{y 3 <s}^ 3 | Hj\ < ^ 2x- — dx + J^_ b 2x 


C 

k],rX 


dx 


Ce~ b Cs 
Ce 

< 


kN 

Using (18.151) to bound the right-hand side of (18.111) and then taking expectations, we get 

Cn 2 e 


K{ n K T -«) # n K T - } n K s e} n £ 

The result now follows by summing over j and using Lemmas 17.41 and 17.71 


kN 


(8.15) 

(8.16) 

□ 
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8.6 Completion of the coupling argument 

Fix a positive integer d and times 0 = to < t\ <■■■< td = t. Recall that equation ()2.51) was 
established as part of Lemma 18.11 Therefore, to prove Theorem 12.11 we need to show that the 
joint distribution of (II v(l + to), ■ ■ ■, IIjv(1 + td)) converges as IV —>• oo to the joint distribution 
of (Ll(l + to), ■ ■ ■, Ll(l + td)), where (II(u), 0 < u < t + 1) is the coalescent process derived from 
the Poisson point process <3? at the end of subsection 18.31 

Proof of Theorem \2.1\ The key to the proof will be to show that with high probability, we have 

n (t-^L] = U* n (t- for all j el with j < L. (8.17) 

\ a AT / v a N J 

Recall that the coalescent process II)y was constructed from the point process Tat in the same 
way that II was constructed from T. Therefore, we simply need to compare the two constructions. 
If (18.171) fails to hold, then one of the following must occur: 

1. Either 11^ (T - r L /a N ) ^ {{1},... , {n}} or II(T - rf/a N ) / {{1},... , {n}}. 

2. For some j E /, we have either Yj > e and Y* < e, or Yj < £ and Y* > e. 

3. For some j E I, we have ITjy(T — tj/cln) / ILJ^T — tj+i/oln) and Yj < e. 

4. For some uE [1,1 + 1], we have II(u) / II(u—) but u does not equal T* for any j. 

5. For some j € I with j < L, we have Yj > e, Y* > e, and II^((T — Tj/a n)~) = n(T J *—), 

but n^(T - Tj/a N ) ± n(T*). 

We now bound the probabilities of these five events. As for the first event, note that (18.11) 
and (14.111) imply that P(Un(T — tl/cln) 7^ {{1}, - - -, {n}}) < Ce + CTn 2 e~ b . Combining this 
result with Lemma 18.21 and (14.11) gives 

p(n)v(T--^ /{{l},...,{n}}) < Cn 2 e. 

By (15.41) . we have T — ta/otv < 1 + 3/feAr, so (16.41) implies T — rf/a^ < 1 + 3/kw + 105T. Because 
each pair of lineages in the Bolthausen-Sznitman coalescent merges at rate 1, it follows that for 
sufficiently large N, 

{{1} - ■ ■ { "») £ +1MT ) 5 CnHT - 

It follows from Lemma 18.61 along with (14.111) and the fact that 5 < e 2 by (15.11) , that the 
probability that the second of the five events above occurs is at most CeT 2 . Likewise, it follows 
from Lemma f8.8l and (14.111) that the probability that the third of the five events occurs is bounded 
above by CTn 2 e. 

Consider next the fourth event listed above. From the construction, this can only happen 
either if, for some j E I, there are two points of $ in [T — r* +1 /a/v, T — t* /oat] x [e, 1] x [0,lf, 
or if there is some point (u, y, Z \,..., z n ) in in which y < £ but two of the points z\,... ,z n 
are less than or equal to y. Recall that if X has the Poisson distribution with mean A, then 


51 










P(X > 2) < A 2 . Therefore, using also (16.21) . the probability that, for some j £ I, there are two 
points of <F in [T — rJ +1 /aN, T — t*/ a^} x [e, 1] x [0, l] n is bounded above by 

\ " f T j+ 1 ~ T j 1 ~ £ \ 2 < v- 1 < CT 

V a N £ ) ~ { £ ^n ) 2 ~ £ 2 kN ’ 


which tends to zero as IV —>• oo. Note that if y is the second coordinate of a point in <h, the 
probability that two of the points zi,... ,z n are less than or equal to y is at most i^)y 2 ■ Therefore, 
the probability that there is a point (u, y, z \,..., z n ) in in which y < e but two of the points 
Zi,... ,z n are less than or equal to y is bounded above by 

*1 X ' Q)s 2 dy = (fjte < CTn’e. 


Finally, consider the fifth of the possibilities above, which means that the coalescence at time 
T- Tj /aN in the process 11^ does not match the coalescence that occurs at time T* in the process 
II. One way this could happen would be if the time interval [T — r* +1 /aN,T — t* /ax] is not 
entirely contained in the interval [l,i + 1 ], By ( 16 . 21 ) and ( 16 . 141 ) . the number of j € I for which 
this interval is not contained in [l,t + 1 ] is at most C5TJcn■ By Lemmas 17 . 4117.71 and 17.81 along 
with ( 14 . 111 ) and part 3 of Proposition 14.31 the probability that Yj > e for some such j is at most 


C5Tk N ■ 


(1 — e)(l + 13#) 
(1 — 25)kN£ 


C5T „ 

+ 2e< -+ Ce. 

£ 


The other way that the coalescence at time T — r,j la n in the process 11^ might not match the 
coalescence that occurs at time T* in the process II would be if one of the random variables 
Zj^ i,..., Z ] n is between Yj and Y*. By Lemma 18.71 the probability that this happens when 
| Yj — Y*\ > s 2 is bounded above by 

C6T log(l/e) 

Using Lemmas 17.4117.71 and !7.8l we see that the probability that this happens when \Yj — Y*\ < e 2 
is at most 

Y 7 -- ne 2 < CTne. 

“ k N e 

Combining the bounds obtained above, we see that for sufficiently large N, the probability 
that (|8.17l) fails to hold is bounded above by 


CTn 2 e + Cn 2 6T + CeT 2 + 


C5T log(l/e) 


(8.18) 


By Lemma [8721 we can replace IL|y by 11^ in ()8.17f) and conclude that the probability that 


nfr- =U n (t- for all j € I with j < L (8.19) 

V a N J V a N J 

fails to hold is also bounded above by the expression in (|8.18l) for sufficiently large N. 

Now suppose that indeed (18.191) holds and A occurs. Fix i € {1,... , d}. Then there exists 
j € I such that T — Tj+i/ajv < ti < T — Tj/aN- By (14.91) and (16.41) . for sufficiently large N, 

t* 2 

T - -<U + - -b 10 5T <U + 115T 

aN kN 
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and 


T • 9 

T - -^±i > T-10TT >ti- 11ST. 

a N ~ k N 

Thus, as long as II(tj — 11 ST) = n(fj + 11<5T) and (18.1911 holds, we must have II(ij) = Il v(ti)- 
However, because each pair of lineages in the Bolthausen-Sznitman coalescent merges at rate one, 
we have 

P(U(ti - 11 5T) / n (ti + 11 5T)) < Q • 22 ST. 

Taking the union over i £ {1 ,... ,d} and using (18.181) . it follows that for sufficiently large N, 

P{U N {ti ) / nfa) for some i € {1,... , d}) < CTr?e + Cdn 2 5T + CeT 2 + C6Tlo s( 1 / e ) _ 

Since 5 < e 3 by (|5.1I) and e > 0 can be chosen arbitrarily small for any fixed T, the theorem 
follows. □ 
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